syssyphus
October 17th, 2007, 12:08 AM
basically I want to do a `uniq -c | sort -n | tail -n 1` in python....
I need to find the most common element in a large list (more than a million elements are in the list)
here is what I am using, (it is way too slow)
nameDict = {}
max = 0
for uniqWord in uniqWordList:
wordCount = wordList.count(uniqWord)
nameDict[wordCount] = uniqWord
if (wordCount > max):
max = wordCount
print "max: ", max
print "common word found, returning"
return max, nameDict[max]
basically I am taking a wordList, that contains many repeated elements. and I am copying it into a set and then back into uniqWordList, removing duplicate elements. I then iterate over this uniq word list and build a dictionary that includes the count of each uniq element, and the element it's self. I keep the largest known count (max) and return it along with the element it corresponds to when I am done.
this is painfully slow, what am I doing wrong?
I need to find the most common element in a large list (more than a million elements are in the list)
here is what I am using, (it is way too slow)
nameDict = {}
max = 0
for uniqWord in uniqWordList:
wordCount = wordList.count(uniqWord)
nameDict[wordCount] = uniqWord
if (wordCount > max):
max = wordCount
print "max: ", max
print "common word found, returning"
return max, nameDict[max]
basically I am taking a wordList, that contains many repeated elements. and I am copying it into a set and then back into uniqWordList, removing duplicate elements. I then iterate over this uniq word list and build a dictionary that includes the count of each uniq element, and the element it's self. I keep the largest known count (max) and return it along with the element it corresponds to when I am done.
this is painfully slow, what am I doing wrong?