PDA

View Full Version : AWK regexp problem



sinbadbuddha
January 23rd, 2010, 03:33 PM
I used a regular expression, /[a-zA-Z0-9_-]{8,12}/ to match any Youtube video ID, but it matches nothing, and will only match of one character is repeated 8-12 times. I'm using GNU awk. Any solutions?

ibuclaw
January 23rd, 2010, 03:39 PM
The expression:

\w
Should should match any of the below without issue.

[A-Z][a-z][0-9]_-
Regards
Iain

ibuclaw
January 23rd, 2010, 03:53 PM
Also, the default installed in Ubuntu is mawk, which apparently doesn't support {} matches.

You can, however achieve this by installing gawk.

sudo apt-get install gawk
and use:

awk --posix
or

gawk --posix

If this proves to be tedious, you can always set an alias.

Regards
Iain

Trumpen
January 23rd, 2010, 04:02 PM
With gawk, you have to enable them with the option --re-interval:


--re-interval
Enable the use of interval expressions in regular expression matching (see Regular Expressions, below). Interval expressions were not traditionally available in the AWK language. The
POSIX standard added them, to make awk and egrep consistent with each other. However, their use is likely to break old AWK programs, so gawk only provides them if they are requested
with this option, or when --posix is specified.

You can use [:alnum:] in the place of [a-zA-Z0-9]:


gawk --re-interval '/[[:alnum:]_-]{8,12}/'

sinbadbuddha
January 23rd, 2010, 04:22 PM
Thanks. Works perfectly now ;)

sinbadbuddha
January 23rd, 2010, 04:36 PM
Sorry, not quite. --re-interval works on the command line, but when I add it to the slashbang like so:

#!/usr/bin/gawk --re-interval -f
gawk just outputs the help. I understand not :confused:

raffaele181188
January 23rd, 2010, 04:41 PM
??? What are you trying to do? :D


#!/bin/sh
gawk .....

;)

sinbadbuddha
January 23rd, 2010, 04:55 PM
??? What are you trying to do? :D


#!/bin/sh
gawk .....

I'm adding
#!/usr/bin/gawk [options] -f to the beginning of an AWK script, to make execution simpler.

#!/bin/sh gawk [options] -f generates an error message. As for using a one-line shell script, that's just ugly.

raffaele181188
January 24th, 2010, 05:21 PM
Oh, ok.
Then I think you should remove the '-f' since


-f program-file
--file program-file
Read the AWK program source from the file program-file, instead of from the first command line argument. Multiple -f (or --file) options may be used.

sinbadbuddha
January 24th, 2010, 06:09 PM
No, that (unfortunately) doesn't help.


#!/usr/bin/gawk -f
# rest of awkscript..
has always worked on my system; if you leave off the '-f' gawk tries to execute your filename. My problem is that this seems to fail when I add the '--re-interval' option.


#!/usr/bin/gawk -f awkscript.awk
#rest of awkscript
executes the command
/usr/bin/gawk -f awkscript.awk awkscript.awk
which again tries to parse your filename.

ibuclaw
January 24th, 2010, 06:55 PM
No, that (unfortunately) doesn't help.


#!/usr/bin/gawk -f
# rest of awkscript..
has always worked on my system; if you leave off the '-f' gawk tries to execute your filename. My problem is that this seems to fail when I add the '--re-interval' option.


#!/usr/bin/gawk -f awkscript.awk
#rest of awkscript
executes the command
/usr/bin/gawk -f awkscript.awk awkscript.awk
which again tries to parse your filename.

http://hibernia.jakma.org/~paul/awk-faq.html#script-args

The short answer is that you can't do that.

Regards
Iain

sinbadbuddha
January 24th, 2010, 10:27 PM
Thanks! :(

raffaele181188
January 25th, 2010, 12:43 AM
:(