How to grep multiple lines at once [Archive]

View Full Version : How to grep multiple lines at once

flummoxed

October 20th, 2007, 03:28 AM

Hi, I'm trying to grep an HTML file that has links to several downloads. An example of a link looks like this:

Download: <a class="ulink" href=
"http://ftp.gnu.org/gnu/binutils/binutils-2.17.tar.bz2">http://ftp.gnu.org/gnu/binutils/binutils-2.17.tar.bz2</a>

So I want to grep for things starting with

Download: <a class="ulink" href= "http://*"

but since the download beginning is on a different line, I can't seem to do it. Is there any way to get grep to search for concatenated
strings or something like that?

Thanks

slavik

October 20th, 2007, 03:29 AM

use perl or pythong to write a script, grep processes input line by line.

flummoxed

October 20th, 2007, 03:34 AM

I was already working on a Python script, but I was wondering if grep could do some magic. I guess not, lol.

Anyways, I've figured out how to do it in Python. But now my dilemma is I want to remove all the crap, other than the URL. Every download link is different, so I can't just use rstrip() and lstrip() to remove generic stuff. Is there any way to use wildcards in python? Like... remove all text from this point to that point, type thing?

ghostdog74

October 20th, 2007, 04:25 AM

awk '/Download/&&$0!~/http/{
getline l;
start=index(l,">")
end = index(l,"<")
print substr(l,start+1,end-start-1)
}' "file"

output:

# ./test.sh
http://ftp.gnu.org/gnu/binutils/binutils-2.17.tar.bz2

direct conversion to Perl:

while (<>) {
chomp;
if (/Download/ && $_ !~ /http/) {
if ($getline_ok = (($_ = <>) ne '')) {
chomp;
}
$l = $_;
$start = index($l, '>');
$end = index($l, '<');
print substr($l, $start + 1, $end - $start - 1);
}
}

geirha

October 20th, 2007, 04:26 AM

For HTML you could consider using the HTMLParser. Maybe something like this?

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.getanchor= False
def handle_data(self, data):
if 'Download: ' in data:
self.getanchor= True
def handle_starttag(self, tag, attrs):
if self.getanchor and tag == 'a':
print dict(attrs)['href']
self.getanchor= False

if __name__ == '__main__':
parser= MyHTMLParser()
parser.feed( file('foo.html').read() )

grep -A1 will display the matching line plus 1 line after that one. So using multiple greps is possible too.

slavik

October 20th, 2007, 04:32 AM

the perl version:

print $_."\n" for(join('',<>) =~ /href=\"(.*?)\"/g);

ronocdh

October 26th, 2009, 05:36 AM

I know I'm resurrecting a dead thread here, but try this format:

command1 | grep "x\\|y\\|z"
E.g.

di | grep "Filesystem\\|md\\|mnt"

ghostdog74

October 26th, 2009, 05:43 AM

xir_

December 9th, 2009, 02:00 PM

bkuebler

April 11th, 2011, 05:21 PM

bashologist

April 12th, 2011, 07:57 PM

I write download scripts like every week. I love doing this so I gotta try.

perl -0 -n -e 'print "$_\n" foreach /Download:\s*<a.*?href=\n?"(.*?)"/isg' test.txt
Wrote that in just a few seconds.

You could then pipe the output to wget if the output looks like what you're wanting.

perl -0 -n -e 'print "$_\n" foreach /Download:\s*<a.*?href=\n?"(.*?)"/isg' test.txt | wget -i -

slavik

April 13th, 2011, 08:00 AM

Should've been closed in 2009 ...