Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: awk regular expressions: \1

  1. #1
    Join Date
    Oct 2013
    Beans
    97
    Distro
    Ubuntu 12.04 Precise Pangolin

    awk regular expressions: \1

    Hello guys,

    With grep -E (or egrep), you can do something like this:

    Code:
    grep -E '(.)(.)\2\1' target.txt
    This matches everything with the pattern abba.

    However, I found that it doesn't work in awk. If you make an awk command like this:

    Code:
    awk '/(.)(.)\2\1/' target.txt
    It will not work. (Using two backslashes ( \\ ) doesn't work either).

    Is there a way to do that in awk?
    Nunca te acostarás, sin saber una cosa más.

  2. #2

    Re: awk regular expressions: \1

    The search term you're looking for is backreference. A quick Google search seems to indicate that awk doesn't support backreferences.

  3. #3
    Join Date
    Sep 2010
    Beans
    62

    Re: awk regular expressions: \1

    Quote Originally Posted by trent.josephsen View Post
    A quick Google search seems to indicate that awk doesn't support backreferences.
    wrong

  4. #4

    Re: awk regular expressions: \1

    From the first hit I got on Google, http://awk.freeshell.org/Backreferences:
    If you need to match a pattern using a regular expression with backreferences, like eg you do in sed or similar things, then well, you can't do that easily with awk.
    Saying "wrong" is not constructive. Is the page I linked incorrect? Am I misinterpreting it? For OP's sake, please don't be rude.

  5. #5
    Join Date
    Apr 2012
    Beans
    5,311

    Re: awk regular expressions: \1

    I may be wrong, but my understanding is that gawk natively supports backreferences only in the replacement string, and only via the gensub function i.e. something like

    Code:
    $ echo 'noon' | awk '{print gensub(/(.)(.)/,"\\2\\1","g")}'
    onno
    whereas the OP is asking about backreferences in the pattern

  6. #6
    Join Date
    Sep 2010
    Beans
    62

    Re: awk regular expressions: \1

    Proof of concept.

    Code:
    # echo 'abba' | awk '{a=$0; b=gensub(/(.)(.)/,"\\2\\1","g"); c=gensub(/(.)(.)/,"\\2\\1","g",b); if (c==a) print "match"}'
    ok

  7. #7
    Join Date
    Oct 2013
    Beans
    97
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: awk regular expressions: \1

    Hello guys. Thank you all for taking the time to answer my question. The conclusion I draw is that awk doesn't support backreferences (apart from the gensub function, which you can use to make a work-around. Although I guess that one gets very laborious if you have more complex cases.). I also came to that conclusion based on what I read in my reference book, but I found it really hard to believe that such a versatile tool as awk doesn't support something as basic as backreferences. I guess that's just the way it is.

    Thanks again to all of you.
    Nunca te acostarás, sin saber una cosa más.

  8. #8
    Join Date
    Dec 2009
    Beans
    167

    Re: awk regular expressions: \1

    Quote Originally Posted by sha1sum View Post
    Hello guys. Thank you all for taking the time to answer my question. The conclusion I draw is that awk doesn't support backreferences (apart from the gensub function, which you can use to make a work-around. Although I guess that one gets very laborious if you have more complex cases.). I also came to that conclusion based on what I read in my reference book, but I found it really hard to believe that such a versatile tool as awk doesn't support something as basic as backreferences. I guess that's just the way it is.

    Thanks again to all of you.
    It does not support backreferences because generally awk distributions use a different (simpler) regex engine (DFA), whereas perl, sed, etc, use the NFA engine. Now in gawk's (Gnu awk) case it uses a hybrid regex engine DFA+NFA and that's why it supports backreferences and only in the gensub function. There are quite a few awk distributions out there (old awk, nawk, tawk, ...) that as far as I know use only the DFA regex engine. Quoting from:

    www.softec.lu/site/RegularExpressions/RegularExpressionEngines
    Regex implementation are based on two main kind of engine: DFA and NFA. Perl, Java, .NET languages, PHP, Python, Ruby,... and most tools implement a Traditional NFA engines. Some less widespread tools like mawk use a POSIX NFA engine which is a variation of the previous one. Awk, egrep, flex, lex, MySQL,... that mostly needs to verify efficiently the success of an overall match implement the more efficient DFA engine. Finally some tools like GNU awk, GNU egrep and Tcl used the best of both world with an hybrid NFA/DFA engine.
    --
    Although I guess that one gets very laborious if you have more complex cases.
    If you want to explore the full potential of regexes then you really need to look elsewhere - perl, python, ruby, etc ... and there's no need to learn the whole language to use their complex regexes, just the right syntax (I'm thinking perl). It'll make life much easier.

    Finally, one of the best books that I've found on the subject of regexes is: Mastering Regular Expressions by Jeffrey Friedl.
    Last edited by erind; January 10th, 2014 at 07:53 PM.

  9. #9
    Join Date
    Oct 2013
    Beans
    97
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: awk regular expressions: \1

    Thanks for that excellent piece of additional information.

    Quote Originally Posted by erind View Post
    If you want to explore the full potential of regexes then you really need to look elsewhere - perl, python, ruby, etc ... and there's no need to learn the whole language to use their complex regexes, just the right syntax (I'm thinking perl). It'll make life much easier.
    I think you are right. The last two weeks I've been studying awk to learn all of it's functions. Partly for the fun and kicks of completely mastering this tool, and partly operating under the philosophy that it's better to know one programming language completely, than to know a little bit of many languages. However it appears like I'm running into awk's limitations, meaning that I'm going to have to start studying something else sooner or later. (I already have a book on Perl and I'm likely going to start reading it one of these days).

    Quote Originally Posted by erind View Post
    Finally, one of the best books that I've found on the subject of regexes is: Mastering Regular Expressions by Jeffrey Friedl.
    I'll be on the lookout for that book. Thanks for steering me in that direction.
    Nunca te acostarás, sin saber una cosa más.

  10. #10
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    1,838
    Distro
    Kubuntu 12.10 Quantal Quetzal

    Re: awk regular expressions: \1

    Quote Originally Posted by erind View Post
    Finally, one of the best books that I've found on the subject of regexes is: Mastering Regular Expressions by Jeffrey Friedl.
    I disagree. It is not "one of the best". It is "the best". Period. Instant regex guru status guaranteed
    Warning: unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •