Page 2 of 2 FirstFirst 12
Results 11 to 19 of 19

Thread: Help to make my Python script run faster!

  1. #11
    Join Date
    Jul 2009
    Beans
    18

    Re: Help to make my Python script run faster!

    Hi, thanks for all your suggestions. Some Pengiun hits the nail on the head. My problem is finding exact match on one and a range query in another. I'm still a beginner in Python but I will give these suggestions a try. Will post again soon.

    P.S. I have updated my initial post to provide a clearer idea.

  2. #12
    Join Date
    Jun 2006
    Beans
    Hidden!

    Re: Help to make my Python script run faster!

    I'm not 100% sure what you're asking for, but the following code should quickly find every matching name that is within 100 of a position of interest.

    Code:
    allSets = {}
    
    for line in f1:
        line = line.strip()
        tags = line.split('\t')
        if tags[0] not in allSets:
            allSets[tags[0]] = set()
        position = int(tags[1])
        allSets[tags[0]] = allSets[tags[0]].union(range(position-50, position+50))
        
    for line in f2:
        tags = line.split('\t')
        if tags[0] in allSets and int(tags[1]) in allSets[tags[0]]:
            print line
    Does that help? If not could you please clarify what you need? For example if f1 had one line "C1\t150" and f2 contained three lines:
    A1\t5\tA
    C1\t150\tQ
    D73\t007\tG

    What would you expect the output to be?
    Last edited by teryret; August 10th, 2010 at 06:42 AM.

  3. #13
    Join Date
    Jul 2009
    Beans
    18

    Re: Help to make my Python script run faster!

    Quote Originally Posted by teryret View Post
    I'm not 100% sure what you're asking for, but the following code should quickly find every matching name that is within 100 of a position of interest.

    Code:
    allSets = {}
    
    for line in f1:
        line = line.strip()
        tags = line.split('\t')
        if tags[0] not in allSets:
            allSets[tags[0]] = set()
        position = int(tags[1])
        allSets[tags[0]] = allSets[tags[0]].union(range(position-50, position+50))
        
    for line in f2:
        tags = line.split('\t')
        if tags[0] in allSets and int(tags[1]) in allSets[tags[0]]:
            print line
    Does that help?
    Thanks a lot! That's a brilliant idea to put the names and range of position in a dictionary! I will never think of it. I have to use name\tposition as key because f1 might contains lines with same name and different position. Now I can see 120 lines being processed every 1 min rather than 3 min.

  4. #14
    Join Date
    Jun 2006
    Beans
    Hidden!

    Re: Help to make my Python script run faster!

    That's the thing, you don't have to keep the key together and it will be faster and more space efficient if you don't. If you use my code in interactive python mode print the contents off allSets in between the two for loops to convince yourself.

    Any chance you could link to your f1 and f2 so I could do some performance tests?

    Edit, also, would you mind posting the current iteration of your code?
    Last edited by teryret; August 10th, 2010 at 05:52 PM.

  5. #15
    Join Date
    Mar 2009
    Location
    Buenos Aires, AR
    Beans
    2,325
    Distro
    Ubuntu

    Re: Help to make my Python script run faster!

    psyco

  6. #16
    Join Date
    Apr 2007
    Location
    NorCal
    Beans
    1,149
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Help to make my Python script run faster!

    Quote Originally Posted by juancarlospaco View Post
    psyco
    Psyco can't fix a bad algorithm.
    Posting code? Use the [code] or [php] tags.
    I don't care, I'm still free. You can't take the sky from me.

  7. #17
    Join Date
    Jul 2009
    Beans
    18

    Re: Help to make my Python script run faster!

    Code updated!

  8. #18
    Join Date
    Jun 2006
    Beans
    Hidden!

    Re: Help to make my Python script run faster!

    Several things:

    1) I promise, you really don't need to keep the keys together and it will be much faster if you don't. This is because nesting for loops always slow things down and should be avoided if at all humanly possible.

    2) Once you've gotten rid of the nested looping you'll find that taking the if out of my first loop will cause a bug.

    3) Consider, in my second loop, putting the characters onto the ends of strings that are in a dictionary indexed by the same keys as my allSets dictionary, then at the end you can simply loop a third time through each built up string and match each against the combined regex.

  9. #19
    Join Date
    Mar 2009
    Location
    Buenos Aires, AR
    Beans
    2,325
    Distro
    Ubuntu

    Talking Re: Help to make my Python script run faster!

    Quote Originally Posted by schauerlich View Post
    Psyco can't fix a bad algorithm.
    The title says "(...) run faster!"
    Not run faster and properly coded.

Page 2 of 2 FirstFirst 12

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •