Results 1 to 4 of 4

Thread: regular expression for consecutive appearance of a character

  1. #1
    Join Date
    Jul 2010
    Beans
    3

    regular expression for consecutive appearance of a character

    Hi people, I've been trying to come up with a regular expression for find strings like:
    mmm
    xxxxxxx
    iiii

    that is, if I use grep "express" test.dat
    It will output the lines above.

    I might be really dumb, I just can't seem to find a solution.

  2. #2
    Join Date
    Sep 2006
    Beans
    2,914

    Re: regular expression for consecutive appearance of a character

    Code:
    $ cat file
    mmm
    xxxxxxx
    iiii
    abc
    
    $ grep -E  "(.)\1" file
    mmm
    xxxxxxx
    iiii
    Last edited by ghostdog74; July 24th, 2010 at 06:03 AM.

  3. #3
    Join Date
    Nov 2009
    Beans
    1,081

    Re: regular expression for consecutive appearance of a character

    It's pretty simple if you're allowed to use backreferences. You can refer to a previous capturing group in the same regular expression, at least in languages like Perl.

  4. #4
    Join Date
    May 2007
    Location
    Canada
    Beans
    374
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: regular expression for consecutive appearance of a character

    I'm assuming that this is what you want:

    Code:
    egrep -o '(.)\1+' < file
    The "-o" option prints only the characters matching the expression. The brackets capture (remember) whatever is inside them. In this case it's the dot, which is any character. The 1 preceded by a backslash references what was captured by the brackets. The plus sign signifies that one or more occurrence of what precedes it must be matched.

    In other words for each character it encounters, it matches and remembers it. When the next character and any following characters are the same as the remembered character they count as matches and since nothing else is left to be matched, everything that was matched is printed.

    egrep is for extended regular expressions. If you want to do it using non-extended regular expressions, you can use the following code:

    Code:
    grep -o '\(.\)\1\+' < file
    The only difference between extended and non-extended regular expressions is that in non-extended regular expressions certain characters must be preceded by backslashes to give them special meaning whereas with extended regular expressions those characters have a special meaning and preceding them by backslashes causes them to be interpreted as the character typed. For example:

    Non-extended Regular Expressions:
    + matches the plus sign
    \+ matches one or more occurrence of what is before it

    Extended Regular Expressions:
    + matches one or more occurrence of what is before it
    \+ matches the plus sign

    Notice that in the example the only difference is that the definitions are inverted.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •