Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: Regular expression question

  1. #1
    Join Date
    Oct 2013
    Beans
    97
    Distro
    Ubuntu 12.04 Precise Pangolin

    Regular expression question

    Hello guys, I need help with regular expressions.

    Quick recap: we know that with advanced regular expressions, you can put something in parenteses to save a part of a regular expression. For example, the regular expression ^(.)(.)\2\1$ will match the words like noon and deed. However, it wil also match a string like 'aaaa' . I don't want that. Is there a way to specify that \1 and \2 cannot be the same?

    Additional info: the example is a simplification of the actual problem. A workaround with egrep -v won't work.
    Nunca te acostarás, sin saber una cosa más.

  2. #2
    Join Date
    Oct 2013
    Beans
    97
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Regular expression question

    Okay, I´ve searched a lot on the internet and in various reference books, but I really couldn´t find a way to do this. Then I took a break, and ingested a copious amount of caffeine and sugar, and I came up with a solution for my own problem.

    I´m going to try to write a script that counts the unique characters in a string. For a word like deed that would be 2, but for aaaa it would be only 1. Then I can filter based on that. That should also work for my bigger problem.
    Nunca te acostarás, sin saber una cosa más.

  3. #3
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,163
    Distro
    Kubuntu 14.04 Trusty Tahr

    Re: Regular expression question

    Quote Originally Posted by sha1sum View Post
    For example, the regular expression ^(.)(.)\2\1$ will match the words like noon and deed..
    It also, surprisingly, matches "boob", and I don't know why that word crossed my mind...
    Warning: unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.

  4. #4
    Join Date
    Oct 2013
    Beans
    97
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Regular expression question

    Quote Originally Posted by ofnuts View Post
    It also, surprisingly, matches "boob", and I don't know why that word crossed my mind...
    lol it did for me too. The original example I wanted to write was: "I need a regular expression that matches book, but not boob". But then I realized; who in their right mind would prefer books over boob-s? So I changed it.
    Nunca te acostarás, sin saber una cosa más.

  5. #5
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,163
    Distro
    Kubuntu 14.04 Trusty Tahr

    Re: Regular expression question

    That explains how you came up with the '(.)(.)' syntax
    Warning: unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.

  6. #6
    Join Date
    Apr 2012
    Beans
    6,413

    Re: Regular expression question

    Apparently you can do it with negative lookahead --> http://stackoverflow.com/a/8057827

    Based on that answer,

    Code:
    cat file
    noon
    book
    deed
    boob
    aaaa
    Code:
    $ grep -Po '(.)((?!\1).)\2\1' file
    noon
    deed
    boob
    Last edited by steeldriver; January 7th, 2014 at 09:41 AM. Reason: removed apparently extraneous passive group

  7. #7
    Join Date
    Oct 2013
    Beans
    97
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Regular expression question

    steeldriver you are awesome.
    Nunca te acostarás, sin saber una cosa más.

  8. #8
    Join Date
    Oct 2013
    Beans
    97
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Regular expression question

    Quote Originally Posted by ofnuts View Post
    That explains how you came up with the '(.)(.)' syntax
    ROFL! Never thought that regular expressions could get so... Freudian. LOL
    Nunca te acostarás, sin saber una cosa más.

  9. #9
    Join Date
    Sep 2010
    Beans
    62

    Re: Regular expression question

    Quote Originally Posted by sha1sum View Post
    I´m going to try to write a script that counts the unique characters in a string. For a word like deed that would be 2, but for aaaa it would be only 1. Then I can filter based on that. That should also work for my bigger problem.
    Code:
    # echo "deed" | ruby -e 'puts gets.chomp.split("").uniq.size'
    2
    # echo "aaaa" | ruby -e 'puts gets.chomp.split("").uniq.size'
    1

  10. #10
    Join Date
    Oct 2013
    Beans
    97
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Regular expression question

    I made the following awk script:

    Code:
    # charcount: an awk script that filters based on the number of unique characters in a line
    
    BEGIN{ FS="" }                   # Make every individual character a field
    { b=0; delete a                  # delete variable b and array a from the previous line
    for(i=1;i<=NF;i++) a[$i]++       # Make an array entry for every character in the line
    for (i in a) b++ }               # count the number of array entries: ie the number of unique characters
    b==2                             # print the line if the number of characters equals 2
    So then the command looks like this:

    Code:
    cat list.txt | egrep '^(.)(.)\2\1$' | awk -f charcount
    This line will print words like deed, noon and boob, but not words like book, blob, and bbbb.

    With that I was able to determine that the only four letter palindromes in the english language are the following:

    boob
    deed
    kook
    noon
    peep
    poop
    sees
    toot
    I don't know what "kook" is, but it was listed in /usr/share/dict/american-english
    Last edited by sha1sum; January 7th, 2014 at 08:34 PM.
    Nunca te acostarás, sin saber una cosa más.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •