PDA

View Full Version : Python: Delete everything after the "<"(Advanced string formatting)


xelapond
February 19th, 2008, 02:38 AM
I am writing a really simple applet to be able to take any part of a web page and parse it into plain text that a human can rad, and deliver it to there desktop. This is more of a learning experience, because I am kind of new to python, I did follow the entire tutorial though. This one piece has me stumped though.

How would I convert a string like

Hello World<Goodbye World

and make python delete the "<" and everything after it?

Thanks,

Alex

LaRoza
February 19th, 2008, 02:40 AM
http://docs.python.org/lib/string-methods.html

You will find several methods there that you can use.

Depending on what you want, you can split the string into several using whatever character you wish as the delimiter (returns a tuple) with split().


>>> str = "Hello world<Goodbye World"
>>> print str.split("<")
['Hello world', 'Goodbye World']

ghostdog74
February 19th, 2008, 02:45 AM
, I did follow the entire tutorial though.

looks like not enough. You should also read the library docs.


This one piece has me stumped though.

How would I convert a string like

Hello World<Goodbye World

and make python delete the "<" and everything after it?

Thanks,

Alex


>>> s="Hello World<Goodbye World"
>>> print s.split("<")[0]
Hello World
>>> s[:s.index("<")]
'Hello World'

Erdaron
February 19th, 2008, 04:21 AM
A brute force and unrefined way:

blah = 'hello world<goodbye world'
blah = blah[0:blah.index('<')]

xelapond
February 19th, 2008, 11:37 AM
Thanks everyone, I got it working.

Im reading the library doc now, hopefully I can figure these out on my own in the future.

pmasiar
February 19th, 2008, 01:23 PM
Parsing HTML/XML by hand is sucker's game - too many exceptions.

Instead, use HTML/XML parser. ElementTree is good for well-formed HTML, I am told that BeautifulSoup can handle also malformed (as in random webpage) HTML. All these parses have function to get the text, excluding tags.

slavik
February 19th, 2008, 01:44 PM
if you want to delete stuff starting at some character, I would advise against using split ... the Perl hacker in me says to use substitution: s/<.*// :) or you can do the array slice that someone suggested to only keep the part you want :)