davidpbrown
April 7th, 2008, 04:18 PM
I'm trying to grab URL's from a text file and have found a few suggestions.
One from ietf itself that looks promising is
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
but I'm having no luck converting this in a way sed will use - I've tried alsorts.
Can anyone suggest the characters that must be escaped and whether I need to opt for some regex extension, or ssed etc to be able to do regex as complex as that?
Having not used sed before, I had hoped for something simple like just escaping the ()s might work..
sed -n -e 's@^\(\([^:/?#]+\):\)?\(//\([^/?#]*\)\)?\([^?#]*\)\(\?\([^#]*\)\)?\(#\(.*\)\)?@\5@p' text
One from ietf itself that looks promising is
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
but I'm having no luck converting this in a way sed will use - I've tried alsorts.
Can anyone suggest the characters that must be escaped and whether I need to opt for some regex extension, or ssed etc to be able to do regex as complex as that?
Having not used sed before, I had hoped for something simple like just escaping the ()s might work..
sed -n -e 's@^\(\([^:/?#]+\):\)?\(//\([^/?#]*\)\)?\([^?#]*\)\(\?\([^#]*\)\)?\(#\(.*\)\)?@\5@p' text