PDA

View Full Version : [SOLVED] Yet another string extraction problem with sed



texpat
June 26th, 2015, 02:52 PM
This kind of question has been asked and answered quite a lot, but I just cant wrap my head around it, so please bear with me.

I have this chunk of text (html source code) I want to sift through and extract the text that sits at a certain position. The position is delimited by a forward slash ( / ) at the beginning and a double quote mark ( " ) at the end. I know that the relevant forward slash ist the 13th occurrence of that character in the text:

foo/foo/...13th slash follows/THIS IS THE STRING I WANT" bar

Doesn't have to be sed, of course. I suppose grep or some other tool would also do the job.

Thanks for reading and I appreciate any pointers.
Tx

Lars Noodén
June 26th, 2015, 03:03 PM
Maybe awk could do it?



awk -F '/' '{ print $13 }'


That assumes that the slash is the only delimiter.

texpat
June 26th, 2015, 03:24 PM
Good start :-) but the string I want ends with " and not /.
Your solution gets me the bit from the 13th slash - which is what I want - till the next slash - which gets me too much

texpat
June 26th, 2015, 03:24 PM
oops, duplicate post.. sorry

Lars Noodén
June 26th, 2015, 03:42 PM
You could then just delete the quote and everything that follows from that field:



awk -F '/' '{ sub( /".*$/, "", $13 ); print $13 }'


I expect that there is probably a more elegant way to do it.

steeldriver
June 26th, 2015, 03:53 PM
If you really want to use sed ...



sed -E 's|(([^/]*/){13})([^"]*)".*|\3|'


but grep with PCRE might be neater



grep -Po '([^/]*/){13}\K.*(?=")'

texpat
June 26th, 2015, 04:03 PM
awk -F '/' '{ sub( /".*$/, "", $13 ); print $13 }'
Awesome, Lars! That does the job. Thanks!

texpat
June 26th, 2015, 04:05 PM
If you really want to use sed ...



sed -E 's|(([^/]*/){13})([^"]*)".*|\3|'


Good! This also works. Thanks a ton!



but grep with PCRE might be neater



grep -Po '([^/]*/){13}\K.*(?=")'

For some reason this doesn't stop at the quotation mark, but gets me the rest of the string, too.

steeldriver
June 26th, 2015, 04:14 PM
Hmm... try making it non-greedy maybe?



grep -Po '([^/]*/){13}\K.*?(?=")'

texpat
June 29th, 2015, 11:28 AM
Hmm... try making it non-greedy maybe?



grep -Po '([^/]*/){13}\K.*?(?=")'


Yep, that nailed it! Thanks! So now I have three solutions to my problem... decisions ;-)