Hi,
I am writing a script to scrape urls from a website. The following is the string I'm trying to match (it is found in the page source read by urllib2.urlopen(webpage).read()):
Code:
"stream_h264_url":"http:\\/\\/www.dailymotion.com\\/cdn\\/H264-512x384\\/video\\/xu41s3.mp4?auth=1349806412-337e4c35a8590a1dabc2761376070386"
The regex search I do in python is:
Code:
re.search('"stream_h264_url":"http:[-\\/a-zA-Z0-9?=.]+"',html)
where html is the page source of the webpage I'm interested in.
But I get an error saying: unexpected end of regular expression.
If I change the regex from,
'"stream_h264_url":"http:[-\\/a-zA-Z0-9?=.]+"'
to
'"stream_h264_url":"http:[-\\\\/a-zA-Z0-9?=.]+"'
everything matches perfectly. But I don't understand why I have to match two backslashes as opposed to a single literal backslash. Shouldn't a literal backslash ('\\') match every single backslash in the page source?
Any help is appreciated.
Bookmarks