Results 1 to 7 of 7

Thread: bash equivalent preg_match_all

  1. #1
    Join Date
    Oct 2011
    Beans
    115

    bash equivalent preg_match_all

    hello everyone! how to translate this code in pure bash?
    Code:
    php -r '
        preg_match_all("/<table class=download>(.*?)<\/table>/",$argv[1],$tab); 
        preg_match_all("/<td>Linux<\/td>(.*?)<\/td>/",$tab[1][0],$rows);
        preg_match_all("/href=(.*?)>/",$rows[1][0],$a);
        echo $a[1][0];
    ' -- "$content"

  2. #2
    Join Date
    Feb 2013
    Beans
    Hidden!

    Re: bash equivalent preg_match_all

    Parsing HTML with bash/sed/awk is rather complicated and very error-prone. Better do it with a real scripting language like perl/python/ruby. They have special library modules for this. If you insist on doing it with sed then at least install something like xml2 and parse its output.
    Code:
    echo "$content" |
    html2 |
    sed -n '\%.*/table/@class=download$%,$!d;/\/td=Linux$/,$!d;\%.*/td/.*/@href=%{s///p;q}'
    Last edited by schragge; January 28th, 2015 at 02:44 PM.

  3. #3
    Join Date
    Oct 2011
    Beans
    115

    Re: bash equivalent preg_match_all

    tells me that the command HTML2 not installed ... and I do not find even with apt-get install.
    if I remove that line but the regular expression did not return any results

  4. #4
    Join Date
    Feb 2013
    Beans
    Hidden!

    Re: bash equivalent preg_match_all

    html2 is part of xml2 package
    Code:
    sudo apt-get install xml2

  5. #5
    Join Date
    Oct 2011
    Beans
    115

    Re: bash equivalent preg_match_all

    ok, now it gives me syntax errors html. the code I am using is this:
    Code:
    content=$(wget http://developer.android.com/sdk/index.html#Other -q -O -)
    
    echo "$content" |
    html2 |
    sed -n '\%.*/table/@class=download$%,$!d;/\/td=Linux$/,$!d;\%.*/td/.*/@href=%{s///p;q}'

  6. #6
    Join Date
    Feb 2013
    Beans
    Hidden!

    Re: bash equivalent preg_match_all

    html2 is rather picky about HTML syntax. Try it this way
    Code:
    content=$(wget -q -O - http://developer.android.com/sdk | sed '/ id="Other"/,$!d')
    echo "$content" |
    html2 2>/dev/null |
    sed -n '\%.*/table/@class=download$%,$!d;/\/td=Linux$/,$!d;\%.*/td/.*/@href=%{s///p;q}'
    Or, if you don't have to keep $content around for later use
    Code:
    wget -q -O - http://developer.android.com/sdk |
    sed '/ id="Other"/,$!d' |
    html2 2>/dev/null |
    sed -n '\%.*/table/@class=download$%,$!d;/\/td=Linux$/,$!d;\%.*/td/.*/@href=%{s///p;q}'
    And I guess in this particular case you could also get away with something as simple as
    Code:
    wget -qO- http://developer.android.com/sdk |
    egrep -om1 'http[^"]+sdk[^"]+linux[^"]+'
    The last one can actually be rewritten in pure bash as you requested:
    Code:
    pattern='http[^"]+sdk[^"]+linux[^"]+'
    wget -qO- http://developer.android.com/sdk |
    while read -r line
    do
        [[ $line =~ $pattern ]] && printf %s\\n "$BASH_REMATCH" && break
    done
    Last edited by schragge; January 29th, 2015 at 07:06 PM.

  7. #7
    Join Date
    Oct 2011
    Beans
    115

    Re: bash equivalent preg_match_all

    works perfectly, thanks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •