help with regexp

**nanotube** · September 17th, 2007

hey everyone
so, i need to match any lines that don't start with a '>', and that contain "attach" in them. my initial impulse is to do the following:

Code:

^[^>].*attach.*$

however, for now-obvious-to-me reasons, that fails to match lines that actually start with the word attach (and don't contain it anymore). e.g, the following line will fail to match:

Code:

attach some stuff

maybe it's just so late in the night that my brain isn't working well, because it seems it should be obvious, but i just can't think of a nice way to correct this. any hints/tips?

i'm hoping to wake up to some good suggestions.

thanks to everyone in advance!

edit: by the way, i'm doing this in javascript - so no fancy stuff that isn't implemented in javascript regex, please.

**Cappy** · September 17th, 2007

How about

Code:

^[^>]*attach

**nanotube** · September 17th, 2007

Originally Posted by Cappy

How about

Code:

^[^>]*attach

but that wouldn't match

Code:

some stuff > attach me now

?

**JinxAu** · September 17th, 2007

I am no expert myself, but if you go with what Cappy posted, but add in a wildcard ".", it seems to work...

Code:

grep '^[^>].*attach' file

From my understanding that says, "Don't match a ">" at the start of the sentence, followed by anything and the string "attach".

Using following test file:

some stuff > attach me now
some stuff > me attach now
> some other stuff without an attachment
< stuff some > me now attach

Results of:

some stuff > attach me now
some stuff > me attach now
< stuff some > me now attach

Cya round
Jinx

**nanotube** · September 17th, 2007

Originally Posted by JinxAu

I am no expert myself, but if you go with what Cappy posted, but add in a wildcard ".", it seems to work...

Code:

grep '^[^>].*attach' file

From my understanding that says, "Don't match a ">" at the start of the sentence, followed by anything and the string "attach".

Using following test file:

Results of:

Cya round
Jinx

hi jinx
this is exactly what i had originally! (see first post). and that fails to match

Code:

attach me now

**JinxAu** · September 17th, 2007

Sorry, best to read before jumping in... I am new to regular expressions, but been going through it over the last couple of days. Trying to get it to "sink in".

How about:

Code:

 grep '^[^>]*.*attach' file

Which would be, "Don't match a ">" at the start of the sentence, if it's there, otherwise except null pattern (with *) followed by anything and the string "attach".

Cya round
Jinx

**Cappy** · September 17th, 2007

No that doesn't work because it turns out like this:

Code:

echo '> attachment' | grep '^[^>]*.*attach'
> attachment

That's why I took it off. I'm not sure of anyway to do this except to match something like

Code:

'^[^>]*[[:space:]]*.*attach'

but that wouldn't match something like this:

Code:

>some attachment

Edit: That above doesn't even work at all

**Sensenseppl** · September 17th, 2007

Edit: Sorry, totally overlooked that one!

**nanotube** · September 17th, 2007

Originally Posted by Sensenseppl

I haven't tested it like I should, and I guess its far from perfect, but this should bring you one step further:

Code:

'^[^>]*attach.*$'

It results in:

Code:

attach_some_stuff
jackson_does_attach
some_attachment
some_attach_stuff

And not resulting in:

Code:

>attach_some_stuff
>some_attach_stuff

read post #2 and #3 above - your suggestion was already suggested, and rejected on the grounds that it wouldn't match

Code:

blabla > attach me

**bigboy_pdb** · September 17th, 2007

This response contains the regular expression that solves your problem (even though it might appear from the first part that one does not exist). It also states where you will have trouble when attempting to solve similar types of problems so please read it and ask any questions if you are confused.

Often when you want to do something when a regular expression isn't matched, there is a good chance that you cannot use a single expression to perform that task. The reason why is because regular expressions can only match sequences of characters, they cannot NOT match a sequence of characters.

It is important that you don't mistake "not matching a specific set of characters in a specific position" for "not matching a sequence of characters". When stating that you don't want to match some set of characters, you are actually stating that you want to match every other character that wasn't mentioned. For example, when you state "[^a]", this actually means that you want to match b through z, 0 to 9, and many non-alphanumeric characters.

As an exercise, think about trying avoid matching a sequence of characters. For example, how can we NOT match "daddy"? The short response is without some incredibly specific information about the format of the data we cannot do this.

Regular expressions are always used to perform some operation. When you are using a regular expression, you are stating "if I match this pattern then I do something". So in other words we could say:
if I match expression E then I will do P
Or we could say:
if I match expression E then I will not do P

It is up to the program to allow you to do this. For example, I could tell sed to delete all lines that have daddy using: "sed '/daddy/d' ". I could tell sed to delete all lines that don't have daddy using: "sed '/daddy/!d' ". The follow code shows this:

Code:

# The "echo -e" commands are sending the following input to sed:
# daddy <newline> baby <newline> mommy <newline> daddy and mommy

# Deletes lines matching "daddy", meaning "baby <newline> mommy" is printed
echo -e " daddy \n baby \n mommy \n daddy and mommy " | sed '/daddy/d'

# Deletes lines that don't match "daddy", meaning "daddy <newline> daddy and mommy" is printed
echo -e " daddy \n baby \n mommy \n daddy and mommy " | sed '/daddy/!d'

What you are trying to say is:
If I do not match ">" at the beginning of the line then if I match "attach" then I will perform some operation (such as printing the line).
The problem with this is you are trying to mix two procedures using regular expressions into one. Sometimes under certain circumstances this can be done.

Due to your rigid conditions we can create an expression to do what you want to do:
^[^>].*attach | ^attach

[EDIT]There should be no spaces on either side of the "|" character, meaning the expression should be '^[^>].*attach|^attach'[/EDIT]

It works because the expression to the left of "|" (which stands for "or") handles the case where the first character is not the letter "a" from the word "attach" and the expression to the right of "|" handles the case where the first character is the letter "a" from the word "attach".

If you had been trying to match an unknown number of ">" characters at the front of the expression, you could not have done this.

Thread: help with regexp

Thread Tools

Display

help with regexp

Re: help with regexp

Re: help with regexp

Re: help with regexp

Re: help with regexp

Re: help with regexp

Re: help with regexp

Re: help with regexp

Re: help with regexp

Re: help with regexp

Bookmarks

Bookmarks

Posting Permissions