PDA

View Full Version : How to parse a part of a link in python?



atomkarinca
March 9th, 2008, 08:37 PM
Hi everyone. How can I crop a part of a link in python? For example, let's say the link is http://www.youtube.com/watch?v=FRnrKzOrp7M and I want to get FRnrKzOrp7M from that link. How do I go about doing that? Thanks.

Acglaphotis
March 9th, 2008, 08:44 PM
string = "http://www.youtube.com/watch?v=FRnrKzOrp7M"
newString = string.replace("http://www.youtube.com/watch?v=", "")


newString only has FRnrKzOrp7M.

atomkarinca
March 9th, 2008, 08:46 PM
Thanks for the quick reply :)

Can+~
March 9th, 2008, 08:51 PM
I would've suggested using regular expressions for that. Like looking for the "?v=_______" with it, since youtube can have different flags like "?locale=" for another language, etc.

atomkarinca
March 9th, 2008, 08:57 PM
How can I do that? What if -like you said- there was another flag?

a9bejo
March 9th, 2008, 09:02 PM
http://www.amk.ca/python/howto/regex/

Acglaphotis
March 9th, 2008, 09:03 PM
Heres a link on how to use regulars expression on python (kinda hard though):

http://www.amk.ca/python/howto/regex/

Or you could replace the "http://www.youtube.com/watch?v=" with "http://www.youtube.com/watch?*="

atomkarinca
March 9th, 2008, 09:40 PM
I would've suggested using regular expressions for that. Like looking for the "?v=_______" with it, since youtube can have different flags like "?locale=" for another language, etc.

How can I use this with replace() ? Can you give a simple example?

Can+~
March 9th, 2008, 10:00 PM
More about Regular expressions:
http://docs.python.org/dev/howto/regex.html

*edited*


import re
ytburl = "http://www.youtube.com/watch?v=FRnrKzOrp7M"
regexp = "v=\\w*"

#It's a good idea to compile it if you're gonna use it more than once.
regexp = re.compile(regexp, re.IGNORECASE)
result = regexp.search(ytburl)

if (result):
print "String: %s" % result.group()
else:
print "Not found."


Result:

String: v=FRnrKzOrp7M


------------------
(program exited with code: 0)
Press return to continue


I'm sure there's a lot of improvement you could do to the regular expression like using {7,} to specify a minimum of repetitions instead of using the *.

mssever
March 9th, 2008, 10:02 PM
In Ruby, you would do something like

/v=([^&])/.match url
video_id = $1
The regular expression (between, but not including, the slashes) should be the same in Python. $1 is the contents of the first perenthesized part of the regex. I don't know how to access that data in Python, but you should be able to find it easily enough. The regex is the hardest part of this.

atomkarinca
March 9th, 2008, 10:32 PM
More about Regular expressions:
http://docs.python.org/dev/howto/regex.html

*edited*


import re
ytburl = "http://www.youtube.com/watch?v=FRnrKzOrp7M"
regexp = "v=\\w*"

#It's a good idea to compile it if you're gonna use it more than once.
regexp = re.compile(regexp, re.IGNORECASE)
result = regexp.search(ytburl)

if (result):
print "String: %s" % result.group()
else:
print "Not found."
Result:


I'm sure there's a lot of improvement you could do to the regular expression like using {7,} to specify a minimum of repetitions instead of using the *.


Thanks a million times. It worked like a charm :)

Can+~
March 9th, 2008, 10:42 PM
Thanks a million times. It worked like a charm :)

You should optimize it more though, I've noticed that youtube has other characters rather than the usual alphanumeric, so I give that task to you, so you can learn regexp too. Here are other example url's I've collected so you can try them out:


samplelist = [ \
"http://www.youtube.com/watch?v=FRnrKzOrp7M", \
"http://www.youtube.com/watch?v=ngjr6IkDLxg&feature=related", \
"http://de.youtube.com/watch?v=et2Y-_n8_W0&feature=bz303" ]

pmasiar
March 9th, 2008, 11:16 PM
I like to keep things simple, and can do without regex in 80%. This one looks simple enough:


url = "http://www.youtube.com/watch?v=FRnrKzOrp7M"
result = url.split('v=')[1]

mssever
March 9th, 2008, 11:28 PM
I like to keep things simple, and can do without regex in 80%. This one looks simple enough:


url = "http://www.youtube.com/watch?v=FRnrKzOrp7M"
result = url.split('v=')[1]

Does that work if the query string has more parameters?

I think that this is a good candidate for a regex. The regex given above might not work in all cases, though it'll probably work in most. Use something like this to catch all possibilities:
regex = '\\?.*v=([^&]+)'This grabs all characters after v= until it reaches the end of the string or the & character which separates values.

pmasiar
March 9th, 2008, 11:34 PM
Does that work if the query string has more parameters?


url = "http://www.youtube.com/watch?v=FRnrKzOrp7M"
result = url.split('v=')[1]
if '&' in result:
result = result.split('&')[0]


[quote]This grabs all characters after v= until it reaches
the end of the string or the & character which separates values.

this too, and without the regex magic. :-)

I don't hate regex, I even have Perl/regex book signed **personally** by Abigail from Perl conference, but for 80% of tasks, regex is not necessary.

LaRoza
March 9th, 2008, 11:37 PM
this too, and without the regex magic. :-)

I don't hate regex, I even have Perl/regex book signed **personally** by Abigail from Perl conference, but for 80% of tasks, regex is not necessary.

Good, I thought I was the odd one for not using re and preferring to use other methods.

ghostdog74
March 10th, 2008, 02:45 AM
please make use of the library module urlparse


>>> import urlparse
>>> url = "http://www.youtube.com/watch?v=FRnrKzOrp7M"
>>> o = urlparse.urlparse(url)
>>> o[4]
'v=FRnrKzOrp7M'

look up the document for urlparse to know what the attributes mean (http://docs.python.org/lib/module-urlparse.html)

slavik
March 10th, 2008, 04:23 AM
[QUOTE=mssever;4484635]Does that work if the query string has more parameters?


url = "http://www.youtube.com/watch?v=FRnrKzOrp7M"
result = url.split('v=')[1]
if '&' in result:
result = result.split('&')[0]




this too, and without the regex magic. :-)

I don't hate regex, I even have Perl/regex book signed **personally** by Abigail from Perl conference, but for 80% of tasks, regex is not necessary.
this is one of those that is in the 20% of tasks ;)

EDIT: if you split it on the 'v', then how do you know where the v is?



$url = "ur_with_get.php?key1=value1&key2=value2";
$get_string = $1 if ($url =~ /\?(.*)/);
%_GET = map { split /=/, $_ } split /&/, $get_string;


what php does (for GET) behind the scenes in 3 Perl statements. :)

mssever
March 10th, 2008, 04:30 AM
>>> import urlparse
>>> url = "http://www.youtube.com/watch?v=FRnrKzOrp7M"
>>> o = urlparse.urlparse(url)
>>> o[4]
'v=FRnrKzOrp7M'


As in pmasiar's initial example, this one also requires additional code to work properly, since YouTube URLs frequently contain additional parameters in the query string. You can solve this by splitting the string multiple times, or with urlparse plus a split; or, you can write a single simple regex that handles all variations well. While some regexes add complexity to a program, in this case, I think that a regex is the simplest solution by far. (Of course, regexes are a bit awkward in Python. That's one reason I like Ruby :) )

EDIT:

this is one of those that is in the 20% of tasks :wink:
Agreed. Though Perl code tends to make a simple problem look complex... Compare this Ruby code:

/[?&]v=([^&]+)/.match url
the_result = $1

ghostdog74
March 10th, 2008, 04:41 AM
you can write a single simple regex that handles all variations well.

if a simple regex handles all variations well, it most probably would not be simple anymore, don't you agree? :-)


. (Of course, regexes are a bit awkward in Python. That's one reason I like Ruby :) )
that's why Python have strong string manipulation capabilities, so say the Python advocate.

pmasiar
March 10th, 2008, 04:57 AM
Compare: regex solution:

/[?&]v=([^&]+)/.match url
result = $1

vs plain string parsing:


result = url.split('v=')[1]
if '&' in result:
result = result.split('&')[0]

It takes one more line, but is easier to read, unless you are regex expert.

Of course regex solution scales for more complex cases. But in 80% cases, like this one, regex is overkill, IMHO. YMMV, eat your regex if you want, I don't care anymore. :-)

slavik
March 10th, 2008, 05:02 AM
Of course regex solution scales for more complex cases. But in 80% cases, like this one, regex is overkill, IMHO. YMMV, eat your regex if you want, I don't care anymore. :-)

Once I get a regex to match correctly, I don't have to care about it anymore.

ghostdog74
March 10th, 2008, 05:11 AM
Once I get a regex to match correctly, I don't have to care about it anymore.
that's quite untrue isn't it? how do you know your program specs won't change in future?

WW
March 10th, 2008, 05:23 AM
The structure is simple enough; regex does seem like overkill. This python function gets all the assigned values into a dict. Python gurus could probably simplify it even more:



# parse_url_opts.py

def parse_url_opts(str):
(url,opts_str) = str.split("?")
opts_list = opts_str.split("&")
opts_pairs = [s.split("=") for s in opts_list]
return dict(opts_pairs)

Example:


>>> from parse_url_opts import *
>>> z = parse_url_opts("http://www.youtube.com/watch?v=ngjr6IkDLxg&feature=related")
>>> z.keys()
['feature', 'v']
>>> z['feature']
'related'
>>> z['v']
'ngjr6IkDLxg'
>>>


(Something like this might already exist in the cgi module.)

mssever
March 10th, 2008, 05:29 AM
if a simple regex handles all variations well, it most probably would not be simple anymore, don't you agree?
Not necessarily. The regex I wrote above is quite simple.

ghostdog74
March 10th, 2008, 05:35 AM
Not necessarily. The regex I wrote above is quite simple.
yes for this simple case. like you mentioned, if the url contains different kinds of query strings and not just standard "v=xxxxxx", then a simple regex can become quite complex if all variations are to be handled.

slavik
March 10th, 2008, 06:26 AM
# parse_url_opts.py

def parse_url_opts(str):
(url,opts_str) = str.split("?")
opts_list = opts_str.split("&")
opts_pairs = [s.split("=") for s in opts_list]
return dict(opts_pairs)


the Perl version:


sub parse_url_opts {
my $url = shift;
(my $base, my $opts) = split /\?/, $url;
my @opt_list = split /&/, $opts;
my %opt_pairs = map { split /=/, $_ } @opt_list;
return %opt_pairs;
}

the only real difference is how the subroutines get their arguments and in Perl, the map is explicit. :)

LaRoza
March 10th, 2008, 01:41 PM
the Perl version:


sub parse_url_opts {
my $url = shift;
(my $base, my $opts) = split /\?/, $url;
my @opt_list = split /&/, $opts;
my %opt_pairs = map { split /=/, $_ } @opt_list;
return %opt_pairs;
}

the only real difference is how the subroutines get their arguments and in Perl, the map is explicit. :)

There is also a "map" function in Python. Although I don't normally use it, it may be used in a way like yours (I just got up and didn't really read the code, except to say the Perl example is very different visually from the Python)

pmasiar
March 10th, 2008, 02:15 PM
map will be dropped in Py3K IIRC. List comprehension does the same in a more obvious and readable way.

LaRoza
March 10th, 2008, 02:17 PM
map will be dropped in Py3K IIRC. List comprehension does the same in a more obvious and readable way.

http://www.artima.com/weblogs/viewpost.jsp?thread=98196



Update: lambda, filter and map will stay (the latter two with small changes, returning iterators instead of lists). Only reduce will be removed from the 3.0 standard library. You can import it from functools.