PDA

View Full Version : Python text file input Arrays and Dicts



beegary
August 1st, 2011, 12:18 PM
I have been doing some basic tasks (calulations, file read and write) using Python. I am now a bit stuck and getting confused.
I have a series of files in a format like:


Fruit ={

Name = "apple";
Price = 1;
Weight = 3;
Colour = "Red";
DrinkList ={

Cider = 1;
Cordial = 1;
Slush = 0;
};
};
Fruit ={

Name = "pear";
Price = 1.5;
Weight = 6;
Colour = "Yellow";
DrinkList ={

Cider = 1;
Cordial = 0;
Slush = 0;
};
};


I need to be able to read in the file and provide reports on the data, such like a list of fruit that can make cider (Cider = 1).
Each time I read the file it may have a different number of fruits, extended data list within (i.e. some may have a drinklist some may have a piplist, some may have no extended data list).
I am lost trying to work out the basic structure of my python code to separate out the fruits etc.
Just trying to get some tips and trick really before i end up with pages of novice code!!!

Many thanks

MadCow108
August 1st, 2011, 04:01 PM
do you have to use this file format?
is it some kind of standard format? I don't recognize it.

If you can change the format, I recommend using a standard format like json, yaml, xml, rdf or which makes much sense for python: python dictionaries and lists.

These formats all have modules to easily parse them into native python datastructures greatly simplifying the code.

e.g. json:


In [1]: import json

In [2]: json.loads("""[{"Name" : "apple", "Price": 1, "DrinkList" : [ "Slush", "Cider" ]}]""")
Out[2]: [{u'DrinkList': [u'Slush', u'Cider'], u'Name': u'apple', u'Price': 1}]

LemursDontExist
August 1st, 2011, 04:54 PM
For what it's worth, that looks to me like an OpenStep .plist format - possibly you're working on something iPhone related?. I'm with MadCow108 on using json if you can, but if you can't, if you can get whatever program is generating the files to output them in the xml .plist format, then you can use plistlib (http://docs.python.org/dev/library/plistlib.html) to parse them easily.

Writing a reliable parser yourself is a non-trivial undertaking, so unless you're doing this for the fun writing a parser!

If you are writing a parser, and super high speed performance isn't critical, some sort of parser generator will make your life much easier. Pyparsing (http://pyparsing.wikispaces.com/) is simple and easy to use, though it doesn't have the sophisticated features of some of the bigger parser generators.

beegary
August 1st, 2011, 08:01 PM
Thank you for your replies, but no I do not have any choice over the file format.
It is created by an external system which will not export in any other format.

By Parser do you mean just a collection of Python code that will read in all the values for me?
Each 'fruit' in my example is enclosed in {} so I was going to start there?

Bit stumped

LemursDontExist
August 2nd, 2011, 04:44 PM
By Parser do you mean just a collection of Python code that will read in all the values for me?

In essence, yes =). There are thousands of computer languages out there, and generally, taking text in one of them and turning it into data in a natively useful format is parsing, and a program that does that is called a parser.

Making parsers is such a common task that people have made libraries that generate parsers so that you don't have to attend to all the details manually.

I would definitely use Pyparsing for this, but mostly because I'm familiar with it. If the files are in a consistent format, parsing it using basic string operations should be pretty easy, so take a look at pyparsing if you're curious, but don't let me confuse you!

Erdaron
August 2nd, 2011, 10:32 PM
I've had to parse some oddly formatted data files. The structures weren't as complicated, but much longer. So here are the things I've learned.

Regular expressions are not hard to learn, and are super-awesome. The Python module re does everything you need. Here (http://docs.python.org/howto/regex.html) is a tutorial on regex.

Boolean operator in is your best friend. It's a quick search operator that looks for one string in another, and it rocks when matching text:

>>> 'blah' in 'blahblahblah'
True

Work through the database by breaking it down into progressively smaller blocks. For example, you know that a line consisting of just '};' means an end of a section. So you can start with the list generated by open('filename').readlines(), and then break off each individual item.

.strip(), .split(), and .join() are similarly very useful. In particular the first two when parsing text. For example:

>>> a = ' Colour = "Red";'
>>> a.strip().split('=')[1].strip()
'"Red";'
>>> '"Red";'[1:-2]
'Red'

Of course these can be conveniently wrapped into a single line.

The try command may also be useful. It will attempt to execute some commands, but if they fail (for example, if a particular field is absent from a given item), it won't crash the program.

I hope some of this random advice is helpful :D!

beegary
August 3rd, 2011, 11:53 AM
Thank you for the random tips.
OK im going to give it a go!!

I will post back here if I succeed. I mean when!

MadCow108
August 3rd, 2011, 12:46 PM
Doesn't the program which created these datafile provide a library for parsing them?
If yes and its not written in python but in C, you could use it with python ctypes to save you some trouble.
If not the program really sucks and your best shot is probably pyparsing (or a similar parsing library).

MadCow108
August 4th, 2011, 11:54 PM
I by coincidence stumbled over a file which had a suspiciously similar format to yours. And it turns out it may very well be a pretty standard format: libconfig
http://www.hyperrealm.com/libconfig/test.cfg.txt

maybe you can find some python bindings for it, or at least use the native library with ctypes.

beegary
August 5th, 2011, 11:52 AM
Thank you madcow that looks very similar. Ill let you know what I find