View Full Version : Small python script request
Free Thinker
April 29th, 2009, 02:12 PM
I've only just started learning python and was wondering whether someone could help me with my first program, by writing a short script that can find and remove all punctuation, spaces, indentations, capitalisation etc from a text file and produce a new one with every word printed on an individual line in lowercase.
This could be a nice little (I think it would be a short script) project for someone still learning python (but further ahead than me), while also providing me with a script to learn from and use.
ghostdog74
April 29th, 2009, 03:17 PM
so what have you tried?
grepgav
April 29th, 2009, 03:58 PM
This sounds kinda like a homework assignment.
What kind of functions do you think would be useful for getting this done?
simeon87
April 29th, 2009, 04:14 PM
http://ubuntuforums.org/showpost.php?p=7159208&postcount=16
"I needed ..."? Are you sure it's a a goal you've set for yourself? We're not going to do homework.
Free Thinker
April 29th, 2009, 04:47 PM
Computing wasn't even an option at my college. I'm learning this myself using thenewboston's tutorials on youtube...
This thread can be deleted.
grepgav
April 29th, 2009, 06:14 PM
Well you could attack that a couple of ways. If you were doing it in standard c style, you could just iterate over each character and load the correct chars into a buffer.
If you want to learn about python libraries however, check out
http://www.python.org/doc/2.3/lib/module-string.html
Functions such as strip and lower are a place to start, but experiment with them a little bit to see what result you can get.
raydeen
April 29th, 2009, 06:29 PM
I'd bet use of the .split function/method along with .lower would make it criminally easy. And then just checking to see if the ascii values are between a and z would strip out the punctuation. I'm just starting to get my feet wet in Python.
Edit: Shoot. grepav beat me to it.
Reiger
April 29th, 2009, 06:40 PM
Well last time I looked even English contains non ASCII characters. Matching punctuation is probably best done by actually matching punctuation and not trying to use a blanket match case which is bound to fail in just about every language. Much easier and reliable to just split around punctuation/whitespace.
raydeen
April 29th, 2009, 07:03 PM
Well last time I looked even English contains non ASCII characters. Matching punctuation is probably best done by actually matching punctuation and not trying to use a blanket match case which is bound to fail in just about every language. Much easier and reliable to just split around punctuation/whitespace.
Right. I just meant checking to see if the character in question was in the value range between lowercase 'a' and lowercase 'z' and then throwing it out if it wasn't. I think the OP said something about stripping out punctuation.
I'm a noob so I might still be missing your point.
ibuclaw
April 29th, 2009, 08:32 PM
Have a look at the regular expression page. http://docs.python.org/library/re.html
The function you should be looking at is re.split()
ie:
import re;
mystring = "Hello, World";
re.split('\W+', mystring);
That should push you in the right direction.
Regards
Iain
Powered by vBulletin® Version 4.2.2 Copyright © 2024 vBulletin Solutions, Inc. All rights reserved.