PDA

View Full Version : Choose (a) word from a wordlist



hakermania
March 13th, 2010, 12:36 AM
Hi, I have a text file with 1500 words. Could it be a script that will keep the words that only have all these letters:
n i o m s c t a
If you could show me the way I would be greatful!

n0dix
March 13th, 2010, 12:53 AM
Is this for homework?

hakermania
March 13th, 2010, 06:08 AM
Is this for homework?

No, its a mission for a mission-like site.:)

Some Penguin
March 13th, 2010, 11:00 AM
It's a trivial problem and shouldn't be remotely hard for even a beginner -- a solution in Perl can easily be written which is no longer than the original post!

superarthur
March 13th, 2010, 01:02 PM
It's a trivial problem and shouldn't be remotely hard for even a beginner -- a solution in Perl can easily be written which is no longer than the original post!

You took my line. lol
Perl is my favourite language so far.

(to the author of this post) Just use regular expression. ;)

hakermania
March 13th, 2010, 05:19 PM
Somebody suggested perl -lane 'print if (grep {/\b[niomscta]+\b/} $_); ' /home/alex/Desktop/wordlist.txt
But the output was
montana
monica
macintos
action
tomcat
cannon
tinman
nissan
station
samson
tattoo
cccccc
sonics
cosmos
mission
tintin
moomoo
Something like that doesn't suits me. I want to find a word from the wordlist that is developed by the letters "n i o m s c t a"
(Something like unscrumbling the word choosing words only from the wordlist, not globaly)

superarthur
March 13th, 2010, 06:22 PM
Somebody suggested perl -lane 'print if (grep {/\b[niomscta]+\b/} $_); ' /home/alex/Desktop/wordlist.txt
But the output was
montana
monica
macintos
action
tomcat
cannon
tinman
nissan
station
samson
tattoo
cccccc
sonics
cosmos
mission
tintin
moomoo
Something like that doesn't suits me. I want to find a word from the wordlist that is developed by the letters "n i o m s c t a"
(Something like unscrumbling the word choosing words only from the wordlist, not globaly)

Can you give an example of the text you are searching from, and the output you expect?

Some Penguin
March 13th, 2010, 10:45 PM
*shrug*

There's the regex approach, but that really needs sorting first, unless you either want to use one regex per letter, or one monster regex that has every permutation essentially.

Using a table to track which letters you want to see but haven't yet is faster for longer strings. The worst case will be linear in the string size, which scales at least as well as any sort and much better than any comparison-based sort. It also lets you 'abort' easily if the condition is not only that the term *contains* such letters but that it contains no others.

Lux Perpetua
March 14th, 2010, 12:18 AM
You know, I think this would have made a decent "beginner's programming challenge" for this forum. :-D It seems like a fun little throwaway problem that might be useful in learning the ins & outs of a programming language.

kaibob
March 14th, 2010, 07:23 AM
I'm learning and wanted to give this a try. I suspect there is an easier or better way to do this and would appreciate any suggestions.

The thought occurred to me that the OP may want to match words that contain all of the specified letters but no others. If that's the case, my script doesn't work.


#!/bin/bash

while read -a word ; do
for i in ${!word[@]} ; do
match=$(awk '{IGNORECASE=1} /e/ && /n/ && /o/' <<<${word[$i]})
[[ $match ]] && echo ${word[$i]}
done
done < file


$ cat file
One red fox
is done with dinner
but was alone and did not have enough to eat.
$
$ wordscript
One
done
alone
enough

After completing the above, I decided to see if I could use awk without bash. My knowledge of awk is extremely limited, but I cobbled together the following, which does appear to work.


awk '{IGNORECASE=1} { for (i=1;i<=NF;i++) { if ($(i) ~ /o/ && $(i) ~ /n/ && $(i) ~ /e/) print $i }}' file


$ cat file
One red fox
is done with dinner
but was alone and did not have enough to eat.
$
$ awk '{IGNORECASE=1} { for (i=1;i<=NF;i++) { if ($(i) ~ /o/ && $(i) ~ /n/ && $(i) ~ /e/) print $i }}' file
One
done
alone
enough