Results 1 to 8 of 8

Thread: Help with a Regex in Python

  1. #1
    Join Date
    Jun 2007
    Beans
    1,659
    Distro
    Ubuntu

    Help with a Regex in Python

    Hello everyone. I'm a bit (read very) stuck on working "with" regular expressions on Python.

    I pretty much get the regular expression syntax, but I think python handles backslashes in a very specific way.

    Anyway I'm trying to do something awk-like:

    Given a string like "/lhome/rxh09u/gp10-srb/src/stories/business-11821028.xml"

    how can I extract just the word "business" from this, and get it into another string?
    Obviously the word won't always be "business", and the absolute path name up to "/stories" could be different.
    Furthermore the word to extract often includes a hyphen, in the case of "uk-wales" for example.

    So basically if I gave it this string:
    "stories/world-us-canada-11836156.xml" it would need to get "world-us-canada" into a string.

    it always ends in 8 numbers followed by .xml if that helps.

  2. #2
    Join Date
    Jun 2007
    Beans
    1,659
    Distro
    Ubuntu

    Re: Help with a Regex in Python

    OK I used
    Code:
    filename = self._filename[0:-13]
    to make it so the bit I want to match is at the end of the string ("/lhome/rxh09u/gp10-srb/src/stories/business"), but I'm still having problems
    1. How to cope with the fact that there could be hyphens in there?
    2. All I can do is get Python to say "yep, that matched", where I want it to extract the relevant bit?

  3. #3
    Join Date
    Apr 2009
    Location
    Germany
    Beans
    2,134
    Distro
    Ubuntu Development Release

    Re: Help with a Regex in Python

    will that do?
    Code:
    re.search(".*/(stories/.+)-[0-9]{8}\.xml$", s).group(1)
    Last edited by MadCow108; February 23rd, 2011 at 02:02 PM.

  4. #4
    Join Date
    Sep 2010
    Beans
    62

    Re: Help with a Regex in Python

    Just do a couple of splits and join. in Ruby

    Code:
    >> s="/lhome/rxh09u/gp10-srb/src/stories/world-us-canada-11836156.xml"
    => "/lhome/rxh09u/gp10-srb/src/stories/world-us-canada-11836156.xml"
    >> s.split("/")[-1].split("-")[0..-2].join("-")
    => "world-us-canada"
    you can do the same in Python, since they are quite similar.

  5. #5
    Join Date
    Apr 2007
    Location
    (X,Y,Z) = (0,0,0)
    Beans
    3,715

    Re: Help with a Regex in Python

    My version:

    Code:
    re.search(".*?(stories/.+?)-[0-9]*?\.xml$", string)
    Why? Because this would allow searching in a relative path too. Also, it allows the number to be arbitrarily long.

  6. #6
    Join Date
    Mar 2009
    Location
    Western Hemisphere.
    Beans
    136
    Distro
    Ubuntu

    Re: Help with a Regex in Python

    I would be inclined to use os.path and string operations instead of a regex here.
    Code:
    import os
    
    s="/lhome/rxh09u/gp10-srb/src/stories/world-us-canada-11836156.xml"
    os.path.basename(s)[:-13]
    is pretty straight forwards, and should correctly with any filename weirdness you can imagine.

    If the length of the ending were variable, something like kurum's ruby solution could be done in python:

    Code:
    import os
    
    s="/lhome/rxh09u/gp10-srb/src/stories/world-us-canada-11836156.xml"
    '-'.join(os.path.basename(s).split('-')[:-1])
    If you want to avoid importing os:

    Code:
    s="/lhome/rxh09u/gp10-srb/src/stories/world-us-canada-11836156.xml"
    '-'.join(s.split("/")[-1].split('-')[:-1])
    All in all, regexes are serious overkill for this problem.

  7. #7
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,499
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: Help with a Regex in Python

    maybe, but for some it's easier to pay the performance price with regexes they are intimately familiar with than to bother with string slicing-n-dicing and list magic.

  8. #8
    Join Date
    Jun 2007
    Beans
    1,659
    Distro
    Ubuntu

    Re: Help with a Regex in Python

    Quote Originally Posted by Vaphell View Post
    maybe, but for some it's easier to pay the performance price with regexes they are intimately familiar with than to bother with string slicing-n-dicing and list magic.
    Performance is important here.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •