View Full Version : python - checking if any of the values in the list in a string
pedrotuga
January 4th, 2007, 07:35 PM
I have a list of stopwords.
is there an imediate way to check if a string contains any of those or do i have to loop it and test one by one?
also, how do i lowercase a a string?
Steveire
January 4th, 2007, 08:22 PM
stopwords = ["stop", "halt", "freeze"]
for word in stopwords:
if stopword in string:
return
string = "erDRTGrfg"
string.lower()
duff
January 4th, 2007, 10:10 PM
or
stopwords=set('stop','halt','freeze')
stopwords.intersection(set(word.split()))
the resulting set will be empty if word doesn't contain any stopwords.
ghostdog74
January 4th, 2007, 10:16 PM
I have a list of stopwords.
is there an imediate way to check if a string contains any of those or do i have to loop it and test one by one?
also, how do i lowercase a a string?
you can just use "in" keyword
stopwords = ["stop", "halt", "freeze"]
if not word in stopwords:
print "Word is not in stopwords"
to change string to lowercase
astring = "THIS STRING IS UPPER" #note: don't use "string" as a variable name.
print astring.lowercase()
pedrotuga
January 5th, 2007, 08:10 AM
thanks everybody.
python cycle syntax is kind of elegant :) so the solution i said in the beggining is actualy very simpleas steve showed. Though, i think the intersection might be faster
pmasiar
January 5th, 2007, 09:45 AM
Why guess, it's not hard to time it yourself:
Program to time:
import time
loopTimes = 10000000
stoplist = ["stop", "halt", "freeze"]
stopSet = set(["stop", "halt", "freeze"])
string = 'many differnet words which may containn stop or may not'
wordsSet = set(string.split())
def using_in(text, stopwords):
"return 1 (true) if any of the stopwords are in text -- using in"
for word in stopwords:
if word in text:
return 1
return 0
def using_set(wordsSet, stopSet):
"return 1 (true) if any of the stopwords are in text -- using sets"
return len( wordsSet & stopSet)
tStart = time.time()
for ii in range(loopTimes):
pass
t_empty = time.time() - tStart # empty loop
tStart = time.time()
for ii in range(loopTimes):
res = using_in(string, stoplist)
t_in = time.time() - tStart # using in
tStart = time.time()
for ii in range(loopTimes):
res = using_set(wordsSet, stopSet)
t_set = time.time() - tStart # using set
print loopTimes , 'using in:', t_in - t_empty
print loopTimes , 'using set:', t_set - t_empty
results of 3 runs (increasing loop times)
>>> ================================ RESTART
100 using in: 0.0
100 using set: 0.0
>>> ================================ RESTART
10000 using in: 0.0160000324249
10000 using set: 0.0149998664856
>>> ================================ RESTART
1000000 using in: 1.78200006485
1000000 using set: 0.983999967575
>>> ================================ RESTART
10000000 using in: 17.8279998302
10000000 using set: 9.82899999619
Result: Return of invested time (learning sets) will pay off if you run the code more than 1 billion times. Less than that, and plain loop is **faster** :-)
pmasiar
January 5th, 2007, 11:12 AM
I was not happy with my previous solution: python mantra is that most obvious solution is the best. So i looked deeper. Sets approach has a little help: target string is preparsed to words. What I will preparse it for "in" approach too?
I added into obvious places these snippets:
wordlist = string.split()
def using_inlist(wordlist, stopwords):
"using in from list"
for word in stopwords:
if word in wordlist:
return 1
return 0
tStart = time.time()
for ii in range(loopTimes):
res = using_inlist(wordlist, stoplist)
t_inlist = time.time() - tStart # using in list
print loopTimes , 'using in list:', t_inlist - t_empty
result: Obvious apprach IS the fastest!
1000000 using in: 1.78099989891
1000000 using in list: 0.922000169754
1000000 using set: 0.952999830246
slavik
January 5th, 2007, 01:00 PM
this problem is O(nm) complexity. you are matching all elements in one array to all elements in another array.
RSL
April 10th, 2007, 05:48 PM
Aw... Not to be a jerk but this is one of those moments where Ruby's Enumerable#include? would be sweet, right?
%w{list of stop words}.include?(word)
Oh, crap. I just became _that_ guy. :(
pmasiar
April 10th, 2007, 08:47 PM
>>> stop_words = ['a', 'b', 'c', 'd']
>>> word = 'd'
>>> word in stop_words
True
It is python, after all. Simple things are obvious, hard are possible. :-)
WW
April 10th, 2007, 11:27 PM
Something like this, maybe? (If you are using Python 2.5, the 'any' function is built-in.)
>>> def any(seq):
... for s in seq:
... if s:
... return True
... return False
...
>>> stopwords = ['stop','end','halt','quit']
>>> sentence1 = 'Four score and seven years ago'
>>> sentence2 = 'Should I stop using python?'
>>> any([sentence1.find(word) != -1 for word in stopwords])
False
>>> any([sentence2.find(word) != -1 for word in stopwords])
True
>>>
ghostdog74
April 11th, 2007, 12:35 AM
>>> stopwords = ['stop','end','halt','quit']
>>> sentence = 'Should I stop using python?'
>>> for x in stopwords:
... sentence.__contains__(x)
...
True
False
False
False
vBulletin® v3.7.2, Copyright ©2000-2008, Jelsoft Enterprises Ltd.