PDA

View Full Version : python: splliting a file using a word


bala_biophy
April 15th, 2008, 07:24 AM
Dear Friends,

I have attached below the sample of a file im processing,

step = 1000 tps = 3746 tk = 301 ps = 248
E1 = -86920 E2 = 18487 E3 = -105407

------------------------------------------------------------------------------

step = 2000 tps = 3748 tk = 300 ps = -150
E1 = -87032 E2 = 18386 E3 = -105419

------------------------------------------------------------------------------

step = 3000 tps = 3750.000 tk = 302 ps = 50
E1 = -87089 E2 = 18502 E3 = -105591

------------------------------------------------------------------------------

My objective is to extract the values of the variable (tps,tk etc) for each step value and arrange them in columns. I would like to know how i can split the file contents by the word "step". I would also appreciate any better way to do the same.

Thanks,
Bala

themusicwave
April 15th, 2008, 07:48 AM
All the string methods you need are here: http://docs.python.org/lib/string-methods.html

Basically it will look something liek this



text = open(path_to_file).read()
text = text.split("step")



You could also split it at spaces or tabs. If the data file is something you are producing I would recommend using commas to delimit the data.

ghostdog74
April 15th, 2008, 07:57 AM
how would you want your final output to look like?

meastp
April 15th, 2008, 07:59 AM
check http://docs.python.org/lib/string-methods.html

suggestion:

thefile2 = thefile.replace( ' = ', '=')
thelist = thefile2.split()

#thelist = [ 'step', 1000, 'tps', 3746, 'tk', 301 etc. ]

thedictionary = dict([(x,x+1) for x in thelist])

#thedictionary = { 'step':1000, 'tps':3746 etc..

Martin Witte
April 15th, 2008, 01:54 PM
I would go for regular expressions (http://docs.python.org/lib/module-re.html), for a start you can parse the first type of lines as
#!/usr/bin/env python
import re
line = 'step = 1000 tps = 3746 tk = 301 ps = 248'
m = re.match(r"step\s=\s(?P<step>\d+)\stps\s=\s(?P<tps>\d+)\stk\s=\s(?P<tk>\d+)\sps\s=\s(?P<ps>\d+)", line)
if m:
print m.groupdict()

the result of this snippet will be; {'ps': '248', 'step': '1000', 'tps': '3746', 'tk': '301'}