This response contains the regular expression that solves your problem (even though it might appear from the first part that one does not exist). It also states where you will have trouble when attempting to solve similar types of problems so please read it and ask any questions if you are confused.
Often when you want to do something when a regular expression isn't matched, there is a good chance that you cannot use a single expression to perform that task. The reason why is because regular expressions can only match sequences of characters, they cannot NOT match a sequence of characters.
It is important that you don't mistake "not matching a specific set of characters in a specific position" for "not matching a sequence of characters". When stating that you don't want to match some set of characters, you are actually stating that you want to match every other character that wasn't mentioned. For example, when you state "[^a]", this actually means that you want to match b through z, 0 to 9, and many non-alphanumeric characters.
As an exercise, think about trying avoid matching a sequence of characters. For example, how can we NOT match "daddy"? The short response is without some incredibly specific information about the format of the data we cannot do this.
Regular expressions are always used to perform some operation. When you are using a regular expression, you are stating "if I match this pattern then I do something". So in other words we could say:
if I match expression E then I will do P
Or we could say:
if I match expression E then I will not do P
It is up to the program to allow you to do this. For example, I could tell sed to delete all lines that have daddy using: "sed '/daddy/d' ". I could tell sed to delete all lines that don't have daddy using: "sed '/daddy/!d' ". The follow code shows this:
Code:
# The "echo -e" commands are sending the following input to sed:
# daddy <newline> baby <newline> mommy <newline> daddy and mommy
# Deletes lines matching "daddy", meaning "baby <newline> mommy" is printed
echo -e " daddy \n baby \n mommy \n daddy and mommy " | sed '/daddy/d'
# Deletes lines that don't match "daddy", meaning "daddy <newline> daddy and mommy" is printed
echo -e " daddy \n baby \n mommy \n daddy and mommy " | sed '/daddy/!d'
What you are trying to say is:
If I do not match ">" at the beginning of the line then if I match "attach" then I will perform some operation (such as printing the line).
The problem with this is you are trying to mix two procedures using regular expressions into one. Sometimes under certain circumstances this can be done.
Due to your rigid conditions we can create an expression to do what you want to do:
^[^>].*attach | ^attach
[EDIT]There should be no spaces on either side of the "|" character, meaning the expression should be '^[^>].*attach|^attach'[/EDIT]
It works because the expression to the left of "|" (which stands for "or") handles the case where the first character is not the letter "a" from the word "attach" and the expression to the right of "|" handles the case where the first character is the letter "a" from the word "attach".
If you had been trying to match an unknown number of ">" characters at the front of the expression, you could not have done this.
Bookmarks