[SOLVED] web pages read via urllib2 are truncated

**twisted_steel** · April 22nd, 2008

I have some Python code that is downloading a page with urllib2 and then doing a read() on the resulting object. I have been printing out the page by redirecting the output to a file and using tail -f. For some reason, the page is never complete. Sometimes it will display up to just before the footer, other times it will be an earlier part in the page. Looking at the urllib2 documentation and all of the examples I have found, I don't believe this should be happening unless it somehow thought there was an end of file (EOF) in the page itself.

I have a method that sets the request variable to a urllib2.Request.

Code:

request = urllib2.Request(self.pageURL)

Another method is then passed this request and does the following (sans try/except block):

Code:

response = urllib2.urlopen(request)
...
# Save the HTML of the resulting page
self.resultPage = response.read()
response.close

After that, I have been printing out the resultPage for debugging. Any idea why this would be happening? The HTML is intact if I download the page with Firefox.

**twisted_steel** · April 22nd, 2008

Apparently for some reason or another, the closing tags were not being shown in the output of tail -f (I even had two types of terminals going), but did exist in the actual redirected file. Good times. Marking this one as solved

**nanotube** · April 22nd, 2008

Originally Posted by twisted_steel

Apparently for some reason or another, the closing tags were not being shown in the output of tail -f (I even had two types of terminals going), but did exist in the actual redirected file. Good times. Marking this one as solved

some notes:

"response.close" should be "response.close()"

one reason why the last bits of the file may not have been showing is if you haven't closed or flushed the file write buffer. by default python's file writes are buffered. don't know if that was the case for you, since you haven't included your actual file writing code, but just throwing it out there.