Results 1 to 9 of 9
I have 2 lists, one contains a list of words, the other contains a list of words that I would like to exclude. An example of the lists:
word list
...
- 07-04-2007 #1Just Joined!
- Join Date
- Mar 2006
- Posts
- 65
Filtering a list of certain words.
I have 2 lists, one contains a list of words, the other contains a list of words that I would like to exclude. An example of the lists:
word list
exclude listCode:train car boat apple carrot drive ram cell
So what I want to do is re-create the list, without the words apple or ram. So I wrote this script (Which is by far the most advanced thing i've ever written. I'm new to this)Code:apple ram
So, after I execute the script, it prints "word excluded" twice to my terminal, but when I check the finalwordlist file I see this:Code:#!/bin/bash for x in `cat wordlist`;do for y in `cat excludelist`; do if [ "$x" != "$y" ]; then echo "$x" >> finalwordlist else echo "word excluded" fi done done
Every word is doubled, except apple and ram. I'm not sure how to go about this anymore. My bash experience ends right about here here (actually it really ended at ifCode:train train car car boat boat apple carrot carrot drive drive ram cell cell
)
Can anyone help me out?
- 07-04-2007 #2
Your inner loop is executing once for each word in the exclude list. That means for each $x, you're comparing $y once for "apple" and once for "ram" and issuing an echo command each time. So when neither one matches, the echo to the finalwordlist gets called twice. Otherwise, it gets called once, and the "word excluded" message is echoed once.
One way around this is to declare a flag that keeps track of whether the inner loop found an $x == $y condition. Then after the inner loop finishes, check the flag and call the appropriate echo. I'd give you code for that, but apparently my own bash skills aren't up to it.
Someone else may have a more elegant solution, so I'd welcome any more input you can get.Stand up and be counted as a Linux user!
- 07-04-2007 #3Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
As Zelmo said, a flag is useful. If you get a match, set the flag and exit -- no use continuing through the loop any farther. However, you need to loop until either you get a match or you're at the end of the exclude list. Here's a solution based on your script:
which produces:Code:#!/bin/sh # @(#) s0 Demonstrate break. echo " sh version: $BASH_VERSION" >&2 for x in `cat data1` do match=false for y in `cat data2` do if [ "$x" = "$y" ] then echo " ($x excluded)" match=true break fi done if [ "$match" != true ] then # echo "$x" >> finalwordlist echo "$x" fi done exit 0
Also, unless necessary I would write to STDOUT, and let the caller decide to append or not, but that's a personal preference.Code:% ./s0 sh version: 2.05b.0(1)-release train car boat (apple excluded) carrot drive (ram excluded) cell
See man bash for details ... cheers, drlWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 07-04-2007 #4Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
Using utilities will be faster than reading lines in a script, especially for long files. If you can allow the files to be re-ordered, then command comm is useful:
producing:Code:#!/bin/sh # @(#) s2 Demonstrate use of comm to exclude strings. set -o nounset echo " sh version: $BASH_VERSION" >&2 sort data1 >t1 sort data2 >t2 nl t1 echo nl t2 echo comm -23 t1 t2 exit 0
cheers, drlCode:% ./s2 sh version: 2.05b.0(1)-release 1 apple 2 boat 3 car 4 carrot 5 cell 6 drive 7 ram 8 train 1 apple 2 ram boat car carrot cell drive trainWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 07-04-2007 #5Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
I wrote something similar last week. Here's a version of that adapted to your problem. It has the advantage of not re-ordering your data and is fast, but at the cost of some complexity.
It uses sed, the stream editor. We use sed twice, once to prepare some editing commands to delete lines that match the words to be excluded, then we feed that back into sed, which then reads the main file, and when it matches an excluded word, it discards it from the output stream (sed does not modify the input data file):
producingCode:#!/bin/sh # @(#) s3 Demonstrate creation of sed script to process deletes. set -o nounset echo " sh version: $BASH_VERSION" >&2 nl data1 echo nl data2 # Create the sed script file with sed itself, for example: # make # apple # into # /apple/d sed 's|\(.*\)|/\1/d|' data2 >script # Run the script against the main data file. echo sed -f script data1 exit 0
cheers, drlCode:% ./s3 sh version: 2.05b.0(1)-release 1 train 2 car 3 boat 4 apple 5 carrot 6 drive 7 ram 8 cell 1 apple 2 ram train car boat carrot drive cellWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 07-05-2007 #6
It seems to me that the simplest solution would be to use grep. After all, the purpose of grep is to look for a word or regex.
Basically, for each line in the wordlist, we check if it is listed in excludelist. If it is, then say "word excluded" on stderr, otherwise print the word to finalwordlist.Code:#!/bin/bash exec 3< wordlist while read line <&3; do if grep -q "$line" excludelist; then echo "word excluded" >&2; else echo "$line" >> finalwordlist fi doneDISTRO=Arch
Registered Linux User #388732
- 07-05-2007 #7Just Joined!
- Join Date
- Mar 2006
- Posts
- 65
Thank you all for the excellent solutions.
- 07-05-2007 #8Linux User
- Join Date
- Aug 2006
- Posts
- 458
Code:awk 'FNR==NR{ arr[$0] ; next} { if ( $0 in arr) { next } else { print } } ' "exclusion_list" "file"
- 07-05-2007 #9Linux User
- Join Date
- Jun 2007
- Posts
- 318


Reply With Quote
