Results 1 to 10 of 10

Thread: Yet another string extraction problem with sed

  1. #1
    Join Date
    Apr 2009
    Beans
    252
    Distro
    Xubuntu 20.04 Focal Fossa

    Yet another string extraction problem with sed

    This kind of question has been asked and answered quite a lot, but I just cant wrap my head around it, so please bear with me.

    I have this chunk of text (html source code) I want to sift through and extract the text that sits at a certain position. The position is delimited by a forward slash ( / ) at the beginning and a double quote mark ( " ) at the end. I know that the relevant forward slash ist the 13th occurrence of that character in the text:

    foo/foo/...13th slash follows/THIS IS THE STRING I WANT" bar

    Doesn't have to be sed, of course. I suppose grep or some other tool would also do the job.

    Thanks for reading and I appreciate any pointers.
    Tx

  2. #2
    Join Date
    Sep 2006
    Beans
    8,627
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: Yet another string extraction problem with sed

    Maybe awk could do it?

    Code:
    awk -F '/' '{ print $13 }'
    That assumes that the slash is the only delimiter.

  3. #3
    Join Date
    Apr 2009
    Beans
    252
    Distro
    Xubuntu 20.04 Focal Fossa

    Re: Yet another string extraction problem with sed

    Good start but the string I want ends with " and not /.
    Your solution gets me the bit from the 13th slash - which is what I want - till the next slash - which gets me too much

  4. #4
    Join Date
    Apr 2009
    Beans
    252
    Distro
    Xubuntu 20.04 Focal Fossa

    Re: Yet another string extraction problem with sed

    oops, duplicate post.. sorry

  5. #5
    Join Date
    Sep 2006
    Beans
    8,627
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: Yet another string extraction problem with sed

    You could then just delete the quote and everything that follows from that field:

    Code:
    awk -F '/' '{ sub( /".*$/, "", $13 ); print $13 }'
    I expect that there is probably a more elegant way to do it.

  6. #6
    Join Date
    Apr 2012
    Beans
    7,256

    Re: Yet another string extraction problem with sed

    If you really want to use sed ...

    Code:
    sed -E 's|(([^/]*/){13})([^"]*)".*|\3|'
    but grep with PCRE might be neater

    Code:
    grep -Po '([^/]*/){13}\K.*(?=")'

  7. #7
    Join Date
    Apr 2009
    Beans
    252
    Distro
    Xubuntu 20.04 Focal Fossa

    Re: Yet another string extraction problem with sed

    awk -F '/' '{ sub( /".*$/, "", $13 ); print $13 }'
    Awesome, Lars! That does the job. Thanks!

  8. #8
    Join Date
    Apr 2009
    Beans
    252
    Distro
    Xubuntu 20.04 Focal Fossa

    Re: Yet another string extraction problem with sed

    Quote Originally Posted by steeldriver View Post
    If you really want to use sed ...

    Code:
    sed -E 's|(([^/]*/){13})([^"]*)".*|\3|'
    Good! This also works. Thanks a ton!

    but grep with PCRE might be neater

    Code:
    grep -Po '([^/]*/){13}\K.*(?=")'
    For some reason this doesn't stop at the quotation mark, but gets me the rest of the string, too.

  9. #9
    Join Date
    Apr 2012
    Beans
    7,256

    Re: Yet another string extraction problem with sed

    Hmm... try making it non-greedy maybe?

    Code:
    grep -Po '([^/]*/){13}\K.*?(?=")'

  10. #10
    Join Date
    Apr 2009
    Beans
    252
    Distro
    Xubuntu 20.04 Focal Fossa

    Re: Yet another string extraction problem with sed

    Quote Originally Posted by steeldriver View Post
    Hmm... try making it non-greedy maybe?

    Code:
    grep -Po '([^/]*/){13}\K.*?(?=")'
    Yep, that nailed it! Thanks! So now I have three solutions to my problem... decisions

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •