Results 1 to 4 of 4
I want to remove duplicate lines in a file, based on a partial match - namely, the second field, space delimited.
uniq will work for the entire row. What will ...
- 03-10-2010 #1Just Joined!
- Join Date
- Mar 2010
- Posts
- 2
Removing Duplicate Entries in a File
I want to remove duplicate lines in a file, based on a partial match - namely, the second field, space delimited.
uniq will work for the entire row. What will do what I want?
File format is:
Line number-spaces-file name-space-primary key
Basically, if a file name appears two or more times in the file, I want to delete ALL rows containing the file name.
I also have the option of doing this on an unnumbered file - thus, the partial match would be on the first field. How would this work?
- 03-10-2010 #2Linux User
- Join Date
- Nov 2009
- Location
- France
- Posts
- 292
If you process each line with grep and awk or cut , you could count the number of occurences of a fileName. If it's greater than one, you may append the file name to a temp file. Then you can process each line of the temp file and delete the corresponding lines with sed in your data file. Quite cumbersome though ! The file names must not contain white spaces however. White spaces in file names are always evil !
A sample data file would help.0 + 1 = 1 != 2 <> 3 != 4 ...
Until the camel can pass though the eye of the needle.
- 03-10-2010 #3Just Joined!
- Join Date
- Mar 2010
- Posts
- 2
nmset,
Here is the general format:
1 f1 a1
2 f1 a2
3 f1 a3
4 f1 a4
5 f2 a5
6 f3 a6
7 f3 a7
8 f3 a8
9 f4 a9
10 f5 a10
11 f6 a11
12 f6 a12
13 f7 a13
14 f8 a14
15 f8 a15
16 f8 a16
17 f8 a17
18 f8 a18
19 f9 a19
20 f10 a20
21 f10 a21
22 f11 a22
I want to end up with a file containing:
5 f2 a5
9 f4 a9
10 f5 a10
13 f7 a13
19 f9 a19
22 f11 a22
(or, actually,
f2 a5
f4 a9
f5 a10
f7 a13
f9 a19
f11 a22
-- that is, without the line numbers)
- 03-10-2010 #4Linux User
- Join Date
- Nov 2009
- Location
- France
- Posts
- 292
This might be what you want.
Makes use of uniq -d to find duplicates on the second column.
Code:for unwanted in $(awk '{print $2}' /tmp/data | uniq -d) do echo $unwanted sed -i /"${unwanted} "/d /tmp/data done0 + 1 = 1 != 2 <> 3 != 4 ...
Until the camel can pass though the eye of the needle.


Reply With Quote