Results 1 to 4 of 4

Thread: [SOLVED] web pages read via urllib2 are truncated

  1. #1
    Join Date
    Oct 2004
    Location
    USA
    Beans
    331
    Distro
    Ubuntu 22.04 Jammy Jellyfish

    [SOLVED] web pages read via urllib2 are truncated

    I have some Python code that is downloading a page with urllib2 and then doing a read() on the resulting object. I have been printing out the page by redirecting the output to a file and using tail -f. For some reason, the page is never complete. Sometimes it will display up to just before the footer, other times it will be an earlier part in the page. Looking at the urllib2 documentation and all of the examples I have found, I don't believe this should be happening unless it somehow thought there was an end of file (EOF) in the page itself.

    I have a method that sets the request variable to a urllib2.Request.

    Code:
    request = urllib2.Request(self.pageURL)
    Another method is then passed this request and does the following (sans try/except block):

    Code:
    response = urllib2.urlopen(request)
    ...
    # Save the HTML of the resulting page
    self.resultPage = response.read()
    response.close
    After that, I have been printing out the resultPage for debugging. Any idea why this would be happening? The HTML is intact if I download the page with Firefox.
    Last edited by twisted_steel; April 22nd, 2008 at 04:21 AM.

  2. #2
    Join Date
    Oct 2004
    Location
    USA
    Beans
    331
    Distro
    Ubuntu 22.04 Jammy Jellyfish

    Re: web pages read via urllib2 are truncated



    Apparently for some reason or another, the closing tags were not being shown in the output of tail -f (I even had two types of terminals going), but did exist in the actual redirected file. Good times. Marking this one as solved
    Last edited by twisted_steel; April 22nd, 2008 at 04:23 AM.

  3. #3
    Join Date
    Jan 2006
    Location
    Philadelphia
    Beans
    4,076
    Distro
    Ubuntu 8.10 Intrepid Ibex

    Re: web pages read via urllib2 are truncated

    Quote Originally Posted by twisted_steel View Post


    Apparently for some reason or another, the closing tags were not being shown in the output of tail -f (I even had two types of terminals going), but did exist in the actual redirected file. Good times. Marking this one as solved
    some notes:

    "response.close" should be "response.close()"

    one reason why the last bits of the file may not have been showing is if you haven't closed or flushed the file write buffer. by default python's file writes are buffered. don't know if that was the case for you, since you haven't included your actual file writing code, but just throwing it out there.

  4. #4
    Join Date
    Oct 2004
    Location
    USA
    Beans
    331
    Distro
    Ubuntu 22.04 Jammy Jellyfish

    Re: web pages read via urllib2 are truncated

    All I was doing to 'write' the file was use print statements in my code. I then ran the program and redirected the output to a file.

    Code:
    ./myprogram.py > output.txt
    I was then watching the output in a terminal with:

    Code:
    tail -f output.txt
    As a side note, any idea where the link to mark a thread solved went?

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •