hansoffate
July 8th, 2008, 12:02 AM
I am trying to query the yeast genome database (http://www.yeastgenome.org/) to search for ORFs and get the FASTA results. Basically, it is just a simple query, get the output from the query, then strip out what I need.
An example of an output that I would be working with is here: http://db.yeastgenome.org/cgi-bin/getSeq?seq=YMR056C&flankl=0&flankr=0&map=p3map
I want to take the ORF ID (YMR056C) and get the FASTA output which is:
>YMR056C Chr 13 reverse complement
MSHTETQTQQSHFGVDFLMGGVSAAIAKTGAAPIERVKLLMQNQEEMLKQ GSLDTRYKGI
LDCFKRTATHEGIVSFWRGNTANVLRYFPTQALNFAFKDKIKSLLSYDRE RDGYAKWFAG
NLFSGGAAGGLSLLFVYSLDYARTRLAADARGSKSTSQRQFNGLLDVYKK TLKTDGLLGL
YRGFVPSVLGIIVYRGLYFGLYDSFKPVLLTGALEGSFVASFLLGWVITM GASTASYPLD
TVRRRMMMTSGQTIKYDGALDCLRKIVQKEGAYSLFKGCGANIFRGVAAA GVISLYDQLQ
LIMFGKKFK*
This is the code I tried to write based off another perlscript I had. Basically, I can get the html and the results that I want to pull out are contained within the <pre>> TEXT HERE </pre>. I am close but I think my Regular Expression isn't written correctly.
Any Ideas?
Thanks,
Hans
#!/usr/bin/perl
use warnings;
use LWP::Simple;
while (<>) {
chomp;
$html = get("http://db.yeastgenome.org/cgi-bin/getSeq?seq=$_&flankl=0&flankr=0&map=p3map");
unless (length($html)) {
warn "Unable to load page for '$_'\n";
next;
}
@found = ();
foreach $line (split("\n", $html)) {
next unless (($fasta) = $line =~ m#<pre>>([.<]+)</pre>#i);
push(@found, $fasta);
}
print "$_: ", join(' ', @found), "\n";
}
An example of an output that I would be working with is here: http://db.yeastgenome.org/cgi-bin/getSeq?seq=YMR056C&flankl=0&flankr=0&map=p3map
I want to take the ORF ID (YMR056C) and get the FASTA output which is:
>YMR056C Chr 13 reverse complement
MSHTETQTQQSHFGVDFLMGGVSAAIAKTGAAPIERVKLLMQNQEEMLKQ GSLDTRYKGI
LDCFKRTATHEGIVSFWRGNTANVLRYFPTQALNFAFKDKIKSLLSYDRE RDGYAKWFAG
NLFSGGAAGGLSLLFVYSLDYARTRLAADARGSKSTSQRQFNGLLDVYKK TLKTDGLLGL
YRGFVPSVLGIIVYRGLYFGLYDSFKPVLLTGALEGSFVASFLLGWVITM GASTASYPLD
TVRRRMMMTSGQTIKYDGALDCLRKIVQKEGAYSLFKGCGANIFRGVAAA GVISLYDQLQ
LIMFGKKFK*
This is the code I tried to write based off another perlscript I had. Basically, I can get the html and the results that I want to pull out are contained within the <pre>> TEXT HERE </pre>. I am close but I think my Regular Expression isn't written correctly.
Any Ideas?
Thanks,
Hans
#!/usr/bin/perl
use warnings;
use LWP::Simple;
while (<>) {
chomp;
$html = get("http://db.yeastgenome.org/cgi-bin/getSeq?seq=$_&flankl=0&flankr=0&map=p3map");
unless (length($html)) {
warn "Unable to load page for '$_'\n";
next;
}
@found = ();
foreach $line (split("\n", $html)) {
next unless (($fasta) = $line =~ m#<pre>>([.<]+)</pre>#i);
push(@found, $fasta);
}
print "$_: ", join(' ', @found), "\n";
}