Find the answer to your Linux question:
Results 1 to 9 of 9
The file "words" is an alphabetically sorted dictionary, which have nearly 400,000 lines, with one word per line. How can I construct and execute a one-line command which turns this ...
  1. #1
    Just Joined!
    Join Date
    Jan 2009
    Posts
    4

    Arrow Construct a one-line command which turns a file into a rhyming dictionary

    The file "words" is an alphabetically sorted dictionary, which have nearly 400,000 lines, with one word per line. How can I construct and execute a one-line command which turns this file into a rhyming dictionary in which words with similar endings are grouped together. The rhyming dictionary should be written to a new file called rhyming.txt.

  2. #2
    Just Joined! pmcoleman's Avatar
    Join Date
    Jan 2009
    Location
    Colorado Springs, CO USA
    Posts
    30
    I have looked into this in the recent past. To date I have never found a oneliner command that will accomplish this task. However, there are a number of ready made Linux programs in C, perl, and Python that will create such a file. ( The Rhyming Dictionary Homepage ) It may be possible to import your list to the application but I have not investigated it.

    I am curious about what you would need a sorted and rhyming list for. My use of word lists usually revolves around system password checking or cracking during pen tests. Not sure how a rhyme list would fit into this scheme.

    The sort command in Linux, as I recall, pivots around alphabetical orders.

  3. #3
    Just Joined!
    Join Date
    Jan 2009
    Posts
    4
    Thank you pmcoleman!

    It's just one of the Linux bash commands exercises in University, and it's the only one that I feel very tricky.

    I tried "rev inputfile|sort|rev >outputfile", but it didn't work due to the dictionary file is so large that beyond the boundary.

  4. #4
    Just Joined! pmcoleman's Avatar
    Join Date
    Jan 2009
    Location
    Colorado Springs, CO USA
    Posts
    30
    If a one liner is mandatory consider this. In the bash scripts that I write using the redirector ">" tends to overwrite existing data within my files. Obviously, this is unacceptable for a log file. My solution was to use ">>" which, from all indications and tests, simply appends the new additional data to the end of the file.

    Have you tried to use >> in your one liner and if so did you get the same results?

    paul

    edit: In my particular situation I sorted a 64MB word list without any buffer problems. Just ran wc -l on the file and found it contains 4496193 words. I sorted this file using the method(s) discussed thus far. ">>" did not return any errors.

  5. #5
    Just Joined!
    Join Date
    Jan 2009
    Posts
    4
    $ rev words|sort|rev >>rhyming.txt
    rev: words: Invalid or incomplete multibyte or wide character

    Yes, it is the same result. I think the problem should be the file size, the command seems to work only if the input file less than 1000 lines

  6. #6
    Just Joined! pmcoleman's Avatar
    Join Date
    Jan 2009
    Location
    Colorado Springs, CO USA
    Posts
    30

    Results

    Running the command you provided and changing the redirect to ">>" returned the following truncated example from my really big word list


    I.E
    rev bigword.lst | sort | sort -u | rev >> reversedBigwordUniqe.lst

    demo1850
    1950
    pc1950
    demo1950
    pc50
    download50
    hp50
    as50
    cs50
    perddims50
    macws50
    client50
    rmax50
    60
    pc1060
    pc2060

    Where similar ending words are now grouped together.

    Old list order starting with demo1850 was:

    demo1850
    demo1851
    demo1856
    demo1857
    demo1858
    demo1859
    demo1860
    demo1861

    As you can see the old list was sorted normally, and the reversed using >> grouped similar endings together. Note: the sort -u option was used because I noticed that apparently I had cat'ed a couple of files in more than once on my original big word list file. You may not need the sort -u pipe.

    Hope this helped.

    Paul

  7. #7
    Just Joined! pmcoleman's Avatar
    Join Date
    Jan 2009
    Location
    Colorado Springs, CO USA
    Posts
    30
    Quote Originally Posted by dasidongxi View Post
    $ rev words|sort|rev >>rhyming.txt
    rev: words: Invalid or incomplete multibyte or wide character

    Yes, it is the same result. I think the problem should be the file size, the command seems to work only if the input file less than 1000 lines
    Interesting, as my list has over 4 million lines or words listed one per line. Maybe you have not enough RAM or swap space enabled.

    I googled your error. Looks like there may be an invalid character in your word list.
    Something to check into....
    Last edited by pmcoleman; 01-30-2009 at 08:19 PM. Reason: add information

  8. #8
    Just Joined!
    Join Date
    Jan 2009
    Posts
    4
    It works now. The problem is the encoding of the file.

    Thank you Paul!

  9. #9
    Just Joined! pmcoleman's Avatar
    Join Date
    Jan 2009
    Location
    Colorado Springs, CO USA
    Posts
    30
    You are welcome. Don't you just love it when the solution is found?

    Paul

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...