Results 1 to 8 of 8

Thread: Downloading a file in python over http (filename not known)

  1. #1
    Join Date
    Sep 2011
    Location
    South Africa
    Beans
    161
    Distro
    Xubuntu 12.10 Quantal Quetzal

    Downloading a file in python over http (filename not known)

    Hi

    I remember when I had to setup my wireless connection for the first time in ubuntu, it was hectic. I'm tryna get my friends to use ubuntu --- I also want to simplify their lives.

    I can across this tutorial http://ubuntuforums.org/showthread.php?t=112526

    I wanted to automate the task using python, perhaps also include a gui.

    The problem however is that I have to download the latest stable version of ndiswrapper from here http://sourceforge.net/projects/ndis.../files/stable/

    How do I download the latest file from that list?
    I can't specify a file name as that will not give me the latest file.

  2. #2
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    1,838
    Distro
    Kubuntu 12.10 Quantal Quetzal

    Re: Downloading a file in python over http (filename not known)

    Quote Originally Posted by 3v3rgr33n View Post
    Hi

    I remember when I had to setup my wireless connection for the first time in ubuntu, it was hectic. I'm tryna get my friends to use ubuntu --- I also want to simplify their lives.

    I can across this tutorial http://ubuntuforums.org/showthread.php?t=112526

    I wanted to automate the task using python, perhaps also include a gui.

    The problem however is that I have to download the latest stable version of ndiswrapper from here http://sourceforge.net/projects/ndis.../files/stable/

    How do I download the latest file from that list?
    I can't specify a file name as that will not give me the latest file.
    "http://sourceforge.net/projects/ndiswrapper/files/stable/" will contain the URLs to the files, so you can parse the page to retrieve these URLs (a regexp will likey be sufficient) and then you can parse these to find which corresponds to the latest version and then use it to download... but it will redirect you to a mirror... so things can get a bit complicated before you get the final download URL. Check the actual sequence with some took like LiveHTTPHeaders (firefox).

    NDIS wrapper is GPLv2 so you can redistribute it freely, and nothing says future versions would be compatible with your script.

    But of course you could have a look here or wonder if the package isn't already in the standard Ubuntu repository
    Last edited by ofnuts; January 5th, 2013 at 10:21 PM.

  3. #3
    Join Date
    Sep 2011
    Location
    South Africa
    Beans
    161
    Distro
    Xubuntu 12.10 Quantal Quetzal

    Re: Downloading a file in python over http (filename not known)

    Quote Originally Posted by ofnuts View Post
    "http://sourceforge.net/projects/ndiswrapper/files/stable/" will contain the URLs to the files, so you can parse the page to retrieve these URLs (a regexp will likey be sufficient) and then you can parse these to find which corresponds to the latest version and then use it to download... but it will redirect you to a mirror... so things can get a bit complicated before you get the final download URL.
    Do you mind giving me an example --- I'm not really a python guru.

  4. #4
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    1,838
    Distro
    Kubuntu 12.10 Quantal Quetzal

    Re: Downloading a file in python over http (filename not known)

    Quote Originally Posted by 3v3rgr33n View Post
    Do you mind giving me an example --- I'm not really a python guru.
    Code:
    urlPattern=re.compile('http://sourceforge.net/projects/ndiswrapper/files/stable/ndiswrapper-(\d+).(\d+).tar.gz/download')
    
    latestUrl=''
    latestVer=-1
    latestRel=-1
    
    with open(downloadsPage) as f:
        for line in f.readlines():
        match=urlPattern.search(line)
        if match:
            url=match.group(0)
            ver=int(match.group(1))
            rel=int(match.group(2))
            print url,ver,rel
            if ver>latestVer or (ver==latestVer and rel>latestRel):
                latestUrl=url
                latestVer=ver
                latestRel=rel
                
    print "Best url: %s" % latestUrl
    In practice the latest version is the first listed so you can exit the loop on the first match and use the simpler:

    Code:
    urlPattern=re.compile('http://sourceforge.net/projects/ndiswrapper/files/stable/ndiswrapper-\d+.\d+.tar.gz/download')
    
    with open(downloadsPage) as f:
        for line in f.readlines():
        match=urlPattern.search(line)
        if match:
            latestUrl=match.group();
            break;
                
    print "Best url: %s" % latestUrl

  5. #5
    Join Date
    Jan 2013
    Beans
    6

    Re: Downloading a file in python over http (filename not known)

    I did it using the following python script. I'm still a begginer so constructive criticism is welcome.

    Code:
    import urllib2
    import sre
    import subprocess
    
    address = 'http://sourceforge.net/projects/ndiswrapper/files/stable/'
    
    try:     
        website = urllib2.urlopen(address) 
    except urllib2.HTTPError, e:     
        print "Cannot retrieve URL: HTTP Error Code", e.code 
    except urllib2.URLError, e:     
        print "Cannot retrieve URL: " + e.reason[1] 
    
    matches = sre.findall('ndiswrapper-\d+.\d+.tar.gz', website.read()) 
    print 'Latest Version: ' + matches[0]
    value = -1
    while value !='y':
        print 'Do you want to Download the latest version? y/n\r'
        value = raw_input()
        if(value == 'y'):
            subprocess.call(['wget', address + matches[0]])
            break
        elif(value == 'n'):
            break
    d3

  6. #6
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    1,838
    Distro
    Kubuntu 12.10 Quantal Quetzal

    Re: Downloading a file in python over http (filename not known)

    Quote Originally Posted by d347hm4n View Post
    I did it using the following python script. I'm still a begginer so constructive criticism is welcome.

    Code:
    import urllib2
    import sre
    import subprocess
    
    address = 'http://sourceforge.net/projects/ndiswrapper/files/stable/'
    
    try:     
        website = urllib2.urlopen(address) 
    except urllib2.HTTPError, e:     
        print "Cannot retrieve URL: HTTP Error Code", e.code 
    except urllib2.URLError, e:     
        print "Cannot retrieve URL: " + e.reason[1] 
    
    matches = sre.findall('ndiswrapper-\d+.\d+.tar.gz', website.read()) 
    print 'Latest Version: ' + matches[0]
    value = -1
    while value !='y':
        print 'Do you want to Download the latest version? y/n\r'
        value = raw_input()
        if(value == 'y'):
            subprocess.call(['wget', address + matches[0]])
            break
        elif(value == 'n'):
            break
    d3
    - I would keep website.read() in the try/except block, it has about as many chances to fail as the urlopen() (network timeouts, etc....)
    - "sre" is deprecated, "re" should be used instead.
    - I don't see much purpose in using findall() to only take the first one, better use re.search(pattern,flags)
    - whether you use findall() or search(), since your pattern doesn't match a complete URL you won't be able to use the result for the download (and you cannot really tell if you are matching a URL or something else...)

  7. #7
    Join Date
    Sep 2011
    Location
    South Africa
    Beans
    161
    Distro
    Xubuntu 12.10 Quantal Quetzal

    Re: Downloading a file in python over http (filename not known)

    Quote Originally Posted by ofnuts View Post
    ... In practice the latest version is the first listed so you can exit the loop on the first match and use the simpler:

    Code:
    urlPattern=re.compile('http://sourceforge.net/projects/ndiswrapper/files/stable/ndiswrapper-\d+.\d+.tar.gz/download')
    
    with open(downloadsPage) as f:
        for line in f.readlines():
        match=urlPattern.search(line)
        if match:
            latestUrl=match.group();
            break;
                
    print "Best url: %s" % latestUrl
    I'm absolutely new to networking with python so forgive me. Can't you just simply read all the packages into a list and then pick the top one, assuming it will at position 0 in the list?

  8. #8
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    1,838
    Distro
    Kubuntu 12.10 Quantal Quetzal

    Re: Downloading a file in python over http (filename not known)

    Yes you can, but if the page is a bit big it gets wasteful. Why parse the whole page when you can stop on the first hit? My code above is a simplification of the one I provided fr the very general case of unsorted URLs, but it can be even simpler:

    Code:
    urlPattern='http://sourceforge.net/projects/ndiswrapper/files/stable/ndiswrapper-\d+.\d+.tar.gz/download'
    
    with open(downloadsPage) as f:
        wholePage=f.read()
        latestUrl=re.search(urlPattern,wholePage)
    or, with network retrieval:
    Code:
    import urllib2,re
    
    downloadPage='http://sourceforge.net/projects/ndiswrapper/files/stable/'
    downloadPattern='http://sourceforge.net/projects/ndiswrapper/files/stable/ndiswrapper-\d+.\d+.tar.gz/download'
    
    try:     
        connection = urllib2.urlopen(downloadPage) 
        page=connection.read()
    except urllib2.HTTPError, e:     
        print 'Cannot retrieve URL: HTTP Error %d' % e.code 
        exit(1)
    except urllib2.URLError, e:     
        print 'Cannot retrieve URL: %s' % e.reason
        exit(1)
    
    firstMatch=re.search(downloadPattern,page)
    if firstMatch:            
        print 'Best download URL: %s' % firstMatch.group()
    else:
        print 'No download URL found,  bad page URL?'
    (note that on Sourceforge, you don't get a 404 error, but some other page up in the hierarchy)
    Last edited by ofnuts; January 9th, 2013 at 01:33 AM.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •