Page 1 of 2 12 LastLast
Results 1 to 10 of 18

Thread: help with regexp

  1. #1
    Join Date
    Jan 2006
    Location
    Philadelphia
    Beans
    4,076
    Distro
    Ubuntu 8.10 Intrepid Ibex

    help with regexp

    hey everyone
    so, i need to match any lines that don't start with a '>', and that contain "attach" in them. my initial impulse is to do the following:
    Code:
    ^[^>].*attach.*$
    however, for now-obvious-to-me reasons, that fails to match lines that actually start with the word attach (and don't contain it anymore). e.g, the following line will fail to match:
    Code:
    attach some stuff
    maybe it's just so late in the night that my brain isn't working well, because it seems it should be obvious, but i just can't think of a nice way to correct this. any hints/tips?

    i'm hoping to wake up to some good suggestions.

    thanks to everyone in advance!

    edit: by the way, i'm doing this in javascript - so no fancy stuff that isn't implemented in javascript regex, please.
    Last edited by nanotube; September 17th, 2007 at 07:52 AM.

  2. #2
    Join Date
    Feb 2007
    Location
    In my chair
    Beans
    1,485

    Re: help with regexp

    How about
    Code:
    ^[^>]*attach
    Last edited by Cappy; September 17th, 2007 at 08:29 AM. Reason: took off .*$ on end
    Currently favorite songs:
    Miss Hyde, This Song Is About Monsters, Masagin, I Will Try to Blow it Out, La Resistance
    Visit www.cherrypeel.com for more free indie music =)

  3. #3
    Join Date
    Jan 2006
    Location
    Philadelphia
    Beans
    4,076
    Distro
    Ubuntu 8.10 Intrepid Ibex

    Re: help with regexp

    Quote Originally Posted by Cappy View Post
    How about
    Code:
    ^[^>]*attach
    but that wouldn't match
    Code:
    some stuff > attach me now
    ?

  4. #4
    Join Date
    Jul 2007
    Location
    Australia
    Beans
    57

    Re: help with regexp

    I am no expert myself, but if you go with what Cappy posted, but add in a wildcard ".", it seems to work...
    Code:
    grep '^[^>].*attach' file
    From my understanding that says, "Don't match a ">" at the start of the sentence, followed by anything and the string "attach".

    Using following test file:
    some stuff > attach me now
    some stuff > me attach now
    > some other stuff without an attachment
    < stuff some > me now attach

    Results of:
    some stuff > attach me now
    some stuff > me attach now
    < stuff some > me now attach

    Cya round
    Jinx
    The Mesh Community Wireless - http://www.the-mesh.org

  5. #5
    Join Date
    Jan 2006
    Location
    Philadelphia
    Beans
    4,076
    Distro
    Ubuntu 8.10 Intrepid Ibex

    Re: help with regexp

    Quote Originally Posted by JinxAu View Post
    I am no expert myself, but if you go with what Cappy posted, but add in a wildcard ".", it seems to work...
    Code:
    grep '^[^>].*attach' file
    From my understanding that says, "Don't match a ">" at the start of the sentence, followed by anything and the string "attach".

    Using following test file:



    Results of:



    Cya round
    Jinx
    hi jinx
    this is exactly what i had originally! (see first post). and that fails to match
    Code:
    attach me now

  6. #6
    Join Date
    Jul 2007
    Location
    Australia
    Beans
    57

    Re: help with regexp

    Sorry, best to read before jumping in... I am new to regular expressions, but been going through it over the last couple of days. Trying to get it to "sink in".

    How about:
    Code:
     grep '^[^>]*.*attach' file
    Which would be, "Don't match a ">" at the start of the sentence, if it's there, otherwise except null pattern (with *) followed by anything and the string "attach".

    Cya round
    Jinx
    The Mesh Community Wireless - http://www.the-mesh.org

  7. #7
    Join Date
    Feb 2007
    Location
    In my chair
    Beans
    1,485

    Re: help with regexp

    No that doesn't work because it turns out like this:
    Code:
    echo '> attachment' | grep '^[^>]*.*attach'
    > attachment
    That's why I took it off. I'm not sure of anyway to do this except to match something like
    Code:
    '^[^>]*[[:space:]]*.*attach'
    but that wouldn't match something like this:
    Code:
    >some attachment
    Edit: That above doesn't even work at all
    Last edited by Cappy; September 17th, 2007 at 03:27 PM.
    Currently favorite songs:
    Miss Hyde, This Song Is About Monsters, Masagin, I Will Try to Blow it Out, La Resistance
    Visit www.cherrypeel.com for more free indie music =)

  8. #8
    Join Date
    Sep 2007
    Beans
    37
    Distro
    Ubuntu 7.04 Feisty Fawn

    Re: help with regexp

    Edit: Sorry, totally overlooked that one!
    Last edited by Sensenseppl; September 17th, 2007 at 04:29 PM.

  9. #9
    Join Date
    Jan 2006
    Location
    Philadelphia
    Beans
    4,076
    Distro
    Ubuntu 8.10 Intrepid Ibex

    Re: help with regexp

    Quote Originally Posted by Sensenseppl View Post
    I haven't tested it like I should, and I guess its far from perfect, but this should bring you one step further:
    Code:
    '^[^>]*attach.*$'
    It results in:
    Code:
    attach_some_stuff
    jackson_does_attach
    some_attachment
    some_attach_stuff
    And not resulting in:
    Code:
    >attach_some_stuff
    >some_attach_stuff
    read post #2 and #3 above - your suggestion was already suggested, and rejected on the grounds that it wouldn't match
    Code:
    blabla > attach me

  10. #10
    Join Date
    May 2007
    Location
    Canada
    Beans
    374
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: help with regexp

    This response contains the regular expression that solves your problem (even though it might appear from the first part that one does not exist). It also states where you will have trouble when attempting to solve similar types of problems so please read it and ask any questions if you are confused.

    Often when you want to do something when a regular expression isn't matched, there is a good chance that you cannot use a single expression to perform that task. The reason why is because regular expressions can only match sequences of characters, they cannot NOT match a sequence of characters.

    It is important that you don't mistake "not matching a specific set of characters in a specific position" for "not matching a sequence of characters". When stating that you don't want to match some set of characters, you are actually stating that you want to match every other character that wasn't mentioned. For example, when you state "[^a]", this actually means that you want to match b through z, 0 to 9, and many non-alphanumeric characters.

    As an exercise, think about trying avoid matching a sequence of characters. For example, how can we NOT match "daddy"? The short response is without some incredibly specific information about the format of the data we cannot do this.

    Regular expressions are always used to perform some operation. When you are using a regular expression, you are stating "if I match this pattern then I do something". So in other words we could say:
    if I match expression E then I will do P
    Or we could say:
    if I match expression E then I will not do P

    It is up to the program to allow you to do this. For example, I could tell sed to delete all lines that have daddy using: "sed '/daddy/d' ". I could tell sed to delete all lines that don't have daddy using: "sed '/daddy/!d' ". The follow code shows this:
    Code:
    # The "echo -e" commands are sending the following input to sed:
    # daddy <newline> baby <newline> mommy <newline> daddy and mommy
    
    # Deletes lines matching "daddy", meaning "baby <newline> mommy" is printed
    echo -e " daddy \n baby \n mommy \n daddy and mommy " | sed '/daddy/d'
    
    # Deletes lines that don't match "daddy", meaning "daddy <newline> daddy and mommy" is printed
    echo -e " daddy \n baby \n mommy \n daddy and mommy " | sed '/daddy/!d'
    What you are trying to say is:
    If I do not match ">" at the beginning of the line then if I match "attach" then I will perform some operation (such as printing the line).
    The problem with this is you are trying to mix two procedures using regular expressions into one. Sometimes under certain circumstances this can be done.

    Due to your rigid conditions we can create an expression to do what you want to do:
    ^[^>].*attach | ^attach

    [EDIT]There should be no spaces on either side of the "|" character, meaning the expression should be '^[^>].*attach|^attach'[/EDIT]

    It works because the expression to the left of "|" (which stands for "or") handles the case where the first character is not the letter "a" from the word "attach" and the expression to the right of "|" handles the case where the first character is the letter "a" from the word "attach".

    If you had been trying to match an unknown number of ">" characters at the front of the expression, you could not have done this.
    Last edited by bigboy_pdb; September 18th, 2007 at 04:19 AM.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •