Results 1 to 9 of 9
The file "words" is an alphabetically sorted dictionary, which have nearly 400,000 lines, with one word per line. How can I construct and execute a one-line command which turns this ...
- 01-30-2009 #1Just Joined!
- Join Date
- Jan 2009
- Posts
- 4
Construct a one-line command which turns a file into a rhyming dictionary
The file "words" is an alphabetically sorted dictionary, which have nearly 400,000 lines, with one word per line. How can I construct and execute a one-line command which turns this file into a rhyming dictionary in which words with similar endings are grouped together. The rhyming dictionary should be written to a new file called rhyming.txt.
- 01-30-2009 #2
I have looked into this in the recent past. To date I have never found a oneliner command that will accomplish this task. However, there are a number of ready made Linux programs in C, perl, and Python that will create such a file. ( The Rhyming Dictionary Homepage ) It may be possible to import your list to the application but I have not investigated it.
I am curious about what you would need a sorted and rhyming list for. My use of word lists usually revolves around system password checking or cracking during pen tests. Not sure how a rhyme list would fit into this scheme.
The sort command in Linux, as I recall, pivots around alphabetical orders.
- 01-30-2009 #3Just Joined!
- Join Date
- Jan 2009
- Posts
- 4
Thank you pmcoleman!
It's just one of the Linux bash commands exercises in University, and it's the only one that I feel very tricky.
I tried "rev inputfile|sort|rev >outputfile", but it didn't work due to the dictionary file is so large that beyond the boundary.
- 01-30-2009 #4
If a one liner is mandatory consider this. In the bash scripts that I write using the redirector ">" tends to overwrite existing data within my files. Obviously, this is unacceptable for a log file. My solution was to use ">>" which, from all indications and tests, simply appends the new additional data to the end of the file.
Have you tried to use >> in your one liner and if so did you get the same results?
paul
edit: In my particular situation I sorted a 64MB word list without any buffer problems. Just ran wc -l on the file and found it contains 4496193 words. I sorted this file using the method(s) discussed thus far. ">>" did not return any errors.
- 01-30-2009 #5Just Joined!
- Join Date
- Jan 2009
- Posts
- 4
$ rev words|sort|rev >>rhyming.txt
rev: words: Invalid or incomplete multibyte or wide character
Yes, it is the same result. I think the problem should be the file size, the command seems to work only if the input file less than 1000 lines
- 01-30-2009 #6
Results
Running the command you provided and changing the redirect to ">>" returned the following truncated example from my really big word list
I.E
rev bigword.lst | sort | sort -u | rev >> reversedBigwordUniqe.lst
demo1850
1950
pc1950
demo1950
pc50
download50
hp50
as50
cs50
perddims50
macws50
client50
rmax50
60
pc1060
pc2060
Where similar ending words are now grouped together.
Old list order starting with demo1850 was:
demo1850
demo1851
demo1856
demo1857
demo1858
demo1859
demo1860
demo1861
As you can see the old list was sorted normally, and the reversed using >> grouped similar endings together. Note: the sort -u option was used because I noticed that apparently I had cat'ed a couple of files in more than once on my original big word list file. You may not need the sort -u pipe.
Hope this helped.
Paul
- 01-30-2009 #7
Last edited by pmcoleman; 01-30-2009 at 08:19 PM. Reason: add information
- 01-30-2009 #8Just Joined!
- Join Date
- Jan 2009
- Posts
- 4
It works now. The problem is the encoding of the file.
Thank you Paul!
- 01-30-2009 #9
You are welcome. Don't you just love it when the solution is found?
Paul


Reply With Quote
