Page 3 of 5 FirstFirst 12345 LastLast
Results 21 to 30 of 41

Thread: Parsing XML to get one value

  1. #21
    Join Date
    Jul 2007
    Location
    The Bavarian Alps
    Beans
    129
    Distro
    Kubuntu 7.10 Gutsy Gibbon

    Re: Parsing XML to get one value

    Between all the philosophy about right and wrong and what the monks would do...

    Quikee wrote that as the xml file I am using makes use of namespace I have to include
    namespace = "http://www.topografix.com/GPX/1/1"
    OK! I went off and read about namespace but a question remains (at least for mere mortals like me)
    How do I know what value I have to give for "namespace"?
    I realise that it is in the XML file but the way I understand things I can only read the file if I know the namespace value.

    Sorry if I am being really dense here. Please bear with me. I am learning. Four months ago I had never heard of Ubuntu and three months ago I thought that Python wa a snake

    Thanks!

  2. #22
    Join Date
    Jun 2006
    Location
    CT, USA
    Beans
    5,267
    Distro
    Ubuntu 6.10 Edgy

    Re: Parsing XML to get one value

    Python still is a snake. Language Python was named not after the snake, but after british comic group "Monty Python Flying Circus". If you wan't seen "Life of Brian" you are missing a lot.

  3. #23
    Join Date
    Apr 2007
    Beans
    14,781

    Re: Parsing XML to get one value

    The namespace of an XML document, means nothing. It is usually an URI of some sort, so they are more likely to be unique, but that URI is not followed.

    The namespace of a document can be different, The namespace will be in the root element, and may be prefixed with a word and a colon. In my page, laroza.freehostia.com/home, you'll see this line in the source:

    Code:
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    I declared an XML namespace (xmlns) of "http://www.w3.org/1990/xhtml". This is a global namespace, all child elements are part of this namespace. The xml:lang="en" attribute is prefixed with xml: because that attribute belongs to another namespace. I could have written:

    Code:
    <html xhtml:xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    But I would have had to write xhtml: before all elements of this namespace. It doesn't have to be xhtml, it could have been almost any word, but giving the namespace a logical name make sense.

    I don't know what the XML looks like that you are reading, and haven't use the functions you are using, so I don't know if it gets the xml of one namespace out of several, or that is the global namespace.

    I hope this helps, but I was not sure exactly what you are asking.

  4. #24
    Join Date
    Jul 2007
    Location
    The Bavarian Alps
    Beans
    129
    Distro
    Kubuntu 7.10 Gutsy Gibbon

    Re: Parsing XML to get one value

    I am 44 and originally come from England so I think that answers your question.
    But the lumberjack song and the dead parrot are both better than life of brian.

    But ...
    How do I find a value for namespace. It is in the root string in
    namespace = "http://www.topografix.com/GPX/1/1"
    tree = ElementTree.parse("test.xml")
    root = tree.getroot()
    but there must be a function to find it.

    Mustn't there.

    I really "wanted to be a lumberjack" but when I told the careers advisor that he assumed I had ovedosed on Monty Python.

  5. #25
    Join Date
    Apr 2006
    Location
    Phoenix, AZ
    Beans
    251
    Distro
    Ubuntu 8.04 Hardy Heron

    Re: Parsing XML to get one value

    Use the DOM example posted, or your own SAX handler. Please don't use a regex, it will be much more difficult to maintain. You may only want one node now, but how about in a month? What if the XML changes?

    If you are parsing unstructured data, use regex. It is very powerful. XML is structured and we have libraries to easily work with it, use them!
    -Skeeterbug

  6. #26
    Join Date
    Apr 2006
    Location
    Slovenia
    Beans
    370
    Distro
    Ubuntu Development Release

    Re: Parsing XML to get one value

    if you look at a part of your xml:
    Code:
    <gpx
      version="1.1"
      creator="Touratech QV 4.0.87 Standard - http://www.ttqv.com"
      xmlnssi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:topografix="http://www.topografix.com/GPX/Pr ivate/TopoGrafix/0/1"
      xmlns="http://www.topografix.com/GPX/1/1"
      xsi:schemaLocation="http://www.topografix.com/GPX/ 1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
      <metadata>
        <time>2007-09-02T17:30:53Z</time>
        <bounds minlat="47.4481272697449" minlon="10.4949045181274" maxlat="47.6289081573486" maxlon="11.0962772369385"/>
      </metadata>
    ....
    you see that <gpx ..> has a attribute xmlns="http://www.topografix.com/GPX/1/1" which means that gpx is part of "http://www.topografix.com/GPX/1/1" namespace and all its children (practically all other elements) are part od this namespace as well.

    If you look even further you see that <gpx..> element has a attribute xmlns:topografix="http://www.topografix.com/GPX/Pr ivate/TopoGrafix/0/1" which means that all attributes defined as <topografix:someElementName> is part of "http://www.topografix.com/GPX/Private/TopoGrafix/0/1" namespace.

    Which namespace and which element "fit together" is defined in the specs of gpx format (the shema of the xml).

    In other words namespaces are just some sort of a discriminator for elements that have the same name.

  7. #27
    Join Date
    Jul 2007
    Location
    The Bavarian Alps
    Beans
    129
    Distro
    Kubuntu 7.10 Gutsy Gibbon

    Re: Parsing XML to get one value

    Quikee is always there when you need him

    Thank you! I understand all that now. After your hints yesterday I read up al about namespace.

    my problem is that it seems like I need to know that the namespace variable is "http://www.topografix.com/GPX/1/1" before I can start parsing. However I can not know what the value for namespace is until I have parsed the file.
    Like the chicken and egg

    Depending on which programme creates a GPX, the namespace variable is different so before starting reading the elevations I need to know what the value for namespace is.

    In your example you have hardcoded the value but that will only work if it agrees with the XML file.

    As I wrote, the namespace is in the string I see by using "print root".
    But I assume that there is some clever way of getting the namespace out of the file. My experiments so far have failed.

    Thanks for all your time and help

    Neill

  8. #28
    Join Date
    May 2007
    Location
    Paris, France
    Beans
    927
    Distro
    Kubuntu 7.04 Feisty Fawn

    Re: Parsing XML to get one value

    Quote Originally Posted by NeillHog View Post
    But I assume that there is some clever way of getting the namespace out of the file. My experiments so far have failed.
    You should be able to parse your XML document into a DOM tree, and from there extract the namespace from the root element.

    Something like:

    Code:
    tree = ElementTree.parse("test.xml")
    root = tree.getroot()
    namespace = root.getnamespace("") // empty string for default namespace

    I used to do it using the Xerces parser, so I guess Python's parser can do it too. Better check the API reference.
    Not even tinfoil can save us now...

  9. #29
    Join Date
    Apr 2006
    Location
    Slovenia
    Beans
    370
    Distro
    Ubuntu Development Release

    Re: Parsing XML to get one value

    Quote Originally Posted by NeillHog View Post
    Depending on which programme creates a GPX, the namespace variable is different so before starting reading the elevations I need to know what the value for namespace is.

    In your example you have hardcoded the value but that will only work if it agrees with the XML file.

    As I wrote, the namespace is in the string I see by using "print root".
    But I assume that there is some clever way of getting the namespace out of the file. My experiments so far have failed.

    Thanks for all your time and help

    Neill
    This is weird. Usually a format defines a namespace (or many of them) for its elements and they are always the same as long you parse the same format of the same version. To have different formats is nonsense - it is like ie, firefox and opera would define its own namespaces for HTML elements. Just imagine the confusion.

    ElementTree that is build into Python 2.5 handles namespaces in a very strange way. That's why I prefer lxml which provides the same interface as the built-in ElementTree + its addons and backend. One of the "add-ons" is a namespace map (nsmap) which is a map/dictionary of all namespaces defined on the current element.

    Code:
    from lxml import etree as ElementTree
    
    if __name__ == "__main__":
    	tree = ElementTree.parse("gpxExampleNS.xml")
    	root = tree.getroot()
    	namespace = root.nsmap[None]
    	print root.nsmap
    	trackSegments = root.getiterator("{%s}trkseg" % namespace)
    	for trackSegment in trackSegments:
    		for trackPoint in trackSegment:
    			print trackPoint.attrib
    			print trackPoint.attrib['lat']
    			print trackPoint.attrib['lon']
    			print trackPoint.find('{%s}ele'% namespace).text
    			print trackPoint.find('{%s}time'% namespace).text
    lxml is in the ubuntu repository.

    I don't know how to do this in normal ElementTree.

  10. #30
    Join Date
    Jul 2007
    Location
    The Bavarian Alps
    Beans
    129
    Distro
    Kubuntu 7.10 Gutsy Gibbon

    Re: Parsing XML to get one value

    Weird it may be but here are the xmlns tags from two GPX files.
    They are only a tiny bit diferent but different enough.
    xmlns="http://www.topografix.com/GPX/1/0"
    xmlns="http://www.topografix.com/GPX/1/1"
    I think the difference is the version but none the less hardcoding isn't going to work.


    Is it possible to use the first code you sent me (none namespace) to parse for the xmlns part of the gpx tag. If that was possible then I would have the namespace and couls use your second (namespace) code to do the rest using the xmlns part as the namespace?

    Another possibility would be to extract the namespace from the root.
    When I do "print root" I get
    <Element {http://www.topografix.com/GPX/1/1}gpx at b7d506ec>
    This contains the namespace that I am looking ffor but is not a string and will not let me do any string operations on it.

    One of these solutions would be ideal because they use only standard python.

    Sorry about all these questions but this is slowly sending me mad. Once I have the values the rest will be easy (famous last words!)

    Thanks
    Neill

Page 3 of 5 FirstFirst 12345 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •