Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Grepping multiple lines at once?

  1. #1
    Join Date
    Nov 2005
    Location
    Leeds, UK
    Beans
    1,634
    Distro
    Ubuntu Development Release

    Grepping multiple lines at once?

    Hi!

    I've got an interesting problem and I'm not sure about how to go about it with grep. Basically, I have a file with 10 million lines which are all numbers:

    1
    2
    3
    6
    23
    42
    64
    2
    1
    2
    3

    I want to count the number of times the pattern "1
    2
    3" repeats in the file. I figured that grep would be the most efficient way to do this but it seems that if I just grep for "1
    2
    3", it matches lines containing just 1s or 2s or 3s rather than a complete pattern which I don't understand why. Does anyone know of a solution? Alternatively, I can create the file with spaces instead of newlines but grep basically just gives me a count of 1 because it counts the number of lines containing the pattern rather than the number of patterns in the single line that the file is made of, which is totally useless to me.

    So any help would be massively appreciated! Thanks!

  2. #2
    Join Date
    Sep 2007
    Location
    England
    Beans
    1,103

    Re: Grepping multiple lines at once?

    Code:
    cat your_file | tr '\n' ' ' | grep -o '[^0-9]\?1 2 3[^0-9]\?' | wc -l
    basically, translate all newlines to space, then match "1 2 3" and count the matches

    The test for an optional non-numeric before & after means this will not match "21 2 3" or "1 2 36" or similar

    *edit*
    oops, fixed a bug in the regex pattern
    Last edited by amauk; November 24th, 2010 at 09:24 PM.
    Code:
    while [ true ]; do CY=$(date +%y); CM=$(date +%m); if [ -n "$PY" ] && [ -n "$PM" ]; then echo "Ubuntu ${CY}.${CM} is the worst release ever"; echo "I'm going back to ${PY}.${PM}"; fi; PY="$CY"; PM="$CM"; sleep 182d; done

  3. #3
    Join Date
    May 2006
    Beans
    1,790

    Re: Grepping multiple lines at once?

    Quote Originally Posted by amauk View Post
    Code:
    cat your_file | tr '\n' ' ' | grep -o '[^0-9]\?1 2 3[^0-9]\?' | wc -l
    basically, translate all newlines to space, then match "1 2 3" and count the matches

    The test for an optional non-numeric before & after means this will not match "21 2 3" or "1 2 36" or similar

    *edit*
    oops, fixed a bug in the regex pattern
    I'm not sure that grep can handle a line millions of characters long. Maybe it can, though.

    Personally, I would write a small program for this purpose.

  4. #4
    Join Date
    Nov 2005
    Location
    Leeds, UK
    Beans
    1,634
    Distro
    Ubuntu Development Release

    Re: Grepping multiple lines at once?

    Yeah, my friend's doing that, I thought grep would work as well though. And yeah, grep might not support such a big file.

    EDIT: I forgot to say thanks to both of you
    Last edited by durand; November 24th, 2010 at 09:48 PM.

  5. #5

    Re: Grepping multiple lines at once?

    in python
    Code:
    import re
    
    def run():
        with open('file.txt') as f:
            a = f.read()
        print len(re.findall(r'.*(1\n2\n3\n).*', a))

  6. #6
    Join Date
    May 2006
    Beans
    1,790

    Re: Grepping multiple lines at once?

    Quote Originally Posted by mo.reina View Post
    in python
    Code:
    import re
    
    def run():
        with open('file.txt') as f:
            a = f.read()
        print len(re.findall(r'.*(1\n2\n3\n).*', a))
    This matches

    11
    2
    3

    too, but I don't know how to fix that elegantly. I've never used multi-line regexps much. Inserting ^ before the 1 doesn't work, in any case.

  7. #7

    Re: Grepping multiple lines at once?

    i'm no regex expert, but try changing the last line to this:

    Code:
      print len(re.findall(r'(?<!1)(1\n2\n3\n)', a))
    that's a negative look behind assertion, so it should match as long as the letter preceding the pattern isn't 1

  8. #8
    Join Date
    May 2006
    Beans
    1,790

    Re: Grepping multiple lines at once?

    Quote Originally Posted by mo.reina View Post
    i'm no regex expert, but try changing the last line to this:

    Code:
      print len(re.findall(r'(?<!1)(1\n2\n3\n)', a))
    that's a negative look behind assertion, so it should match as long as the letter preceding the pattern isn't 1
    Now it matches

    21
    2
    3

  9. #9
    Join Date
    Apr 2009
    Location
    Germany
    Beans
    2,134
    Distro
    Ubuntu Development Release

    Re: Grepping multiple lines at once?

    edit doesn't work matches 1 2 2 3

    ugly but it uses only grep
    grep -A 3 -E "^1$" test.txt | grep -E "^2$" -B 1 -A 1 | grep -E "^3$" -B 2
    count by grepping again: grep "\-\-" | wc -l (+ 1)

    non-regex python:
    Code:
    import io
    
    file = io.open("test.txt")
    c = 0 
    for l in file:
      if l == "1\n" and file.next() == "2\n" and file.next() == "3\n":
        c += 1
    
    print(c)
    Last edited by MadCow108; November 24th, 2010 at 11:40 PM.

  10. #10

    Re: Grepping multiple lines at once?

    ok this is a bit of a hack, it's not pretty and i'm not proud of it...

    Code:
     print len(re.findall(r'(\n1\n2\n3\n|^1\n2\n3\n)', a))
    so there are two patterns being matched, either there's a new line that preceds 1, but this won't count 1 2 3 if it's found at the beginning of the file, so i added a separate pattern ^1\n2\3\n.

    i'm sure there are better ways to do this but it should work.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •