Results 1 to 10 of 15
Good Evening Everyone!
I am trying to get a little bit of help in a few different scripts that I am making. The first one I am trying to get ...
- 10-09-2007 #1Just Joined!
- Join Date
- Oct 2007
- Location
- Houston
- Posts
- 75
simple bash script help.
Good Evening Everyone!
I am trying to get a little bit of help in a few different scripts that I am making. The first one I am trying to get working is making a script that will read one file and if there is more then 2 lines that are the the same it will write 1 line to another file. Here is what I mean
I have a file that contains the lines that are separated by a tab.
data1 data2 data3 data4 data5
Now the lines are repeating in some cases. I may see data1 up to about 21 times in a file. But the fields data1 data2 may remain the same with data4 and data5 do change (data 4 is a path and data5 is a date). What I want to do is if I see the entire line contents the same more then 2 times or more (if I see the line only 1 time I don't care) I want to write all of the data to a file.
My first thought was try a regex expression with grep. But I have failed when I saw that it was not what I was really wanting. Any thoughts on this would greatly be appreciated and I would be willing to spring for the first few rounds at your local pub/bar!
- 10-10-2007 #2Linux Enthusiast
- Join Date
- Aug 2006
- Posts
- 631
Hi,
To write the double lines to another file you can do something like:
RegardsCode:sort file|awk '{if($0 == line){print}else{line = $0}}' >> anotherfile
- 10-10-2007 #3Just Joined!
- Join Date
- Oct 2007
- Location
- Houston
- Posts
- 75
For some reason that command is giving me only so much of the file like 10 or so lines. The file has over 5,000 lines in there.
- 10-10-2007 #4Linux Enthusiast
- Join Date
- Aug 2006
- Posts
- 631
Try the uniq command. To output only unique lines:
To show the double lines (1 time):Code:sort file|uniq
Check the man page of uniq.Code:sort file|uniq -d
Regards
- 10-10-2007 #5Just Joined!
- Join Date
- Oct 2007
- Location
- Houston
- Posts
- 75
Thank you for your response. I am noticing that the display is only 16 lines. Is this a limitation to sort? I tried to use cat in place of sort and I don't get anything to display.
I do appreciate the help very much!
- 10-10-2007 #6Linux Enthusiast
- Join Date
- Aug 2006
- Posts
- 631
It's not really clear how your file looks like and what output you're expecting.
Suppose you have a file like this:
Sorted:Code:line 5 /data/path name5 line 1 /data/path name1 line 2 /data/path name2 line 4 /data/path name4 line 3 /data/path name3 line 1 /data/path name1 line 2 /data/path name2 line 6 /data/path name6 line 1 /data/path name1
With the awk command you'll get this output:Code:line 1 /data/path name1 line 1 /data/path name1 line 1 /data/path name1 line 2 /data/path name2 line 2 /data/path name2 line 3 /data/path name3 line 4 /data/path name4 line 5 /data/path name5 line 6 /data/path name6
with the first uniq command:Code:line 1 /data/path name1 line 1 /data/path name1 line 2 /data/path name2
and with the second uniq command:Code:line 1 /data/path name1 line 2 /data/path name2 line 3 /data/path name3 line 4 /data/path name4 line 5 /data/path name5 line 6 /data/path name6
RegardsCode:line 1 /data/path name1 line 2 /data/path name2
- 10-10-2007 #7Just Joined!
- Join Date
- Oct 2007
- Location
- Houston
- Posts
- 75
The file is a copy of backup schedules I have running. Its an activity log really. The file is broken down like this I have 4 columns and then about 5,000 lines.
What has happened is I need to be able to take this file and keeping the lines the way they are write any line that is duplicated more then 2 times only 1 time in another file. Here is a sample of modified information that the file looks like:
Status file shows:
ETCCode:BKP1000 AID0002 Backup\Path Sunday 10:56:02 PM BKP1000 AID0002 Backup\Path Saturday 10:56:02 PM BKP1000 AID0002 Backup\Path Friday 10:56:02 PM BKP1000 AID0002 Backup\Path Thursday 10:56:02 PM BKP1000 AID0002 Backup\Path Sunday 10:56:02 PM BKP1000 AID0002 Backup\Path Saturday 10:56:02 PM BKP1000 AID0002 Backup\Path Friday 10:56:02 PM BKP1000 AID0002 Backup\Path Thursday 10:56:02 PM BKP3003 AID0005 Backup\Path Monday 10:56:02 PM BKP3003 AID0005 Backup\Path Sunday 10:56:02 PM BKP3003 AID0005 Backup\Path Monday 10:56:02 PM BKP3003 AID0005 Backup\Path Monday 10:56:02 PM
Now some of these lines repeat them self and some do not. The ones that do I want them written into another file. I will call the file that I want the duplicate line written to in the new file called test.
Test should looke like:
The status file (first file) has over 5,000 lines of data. Some of repeat more then 2 times (sometimes 3) and others won't repeat. I need all of the ones in the file that repeat. Also the Backup\Path will may change with the BKP#### and AID#### staying the same.Code:BKP1000 AID0002 Backup\Path Sunday 10:56:02 PM BKP1000 AID0002 Backup\Path Saturday 10:56:02 PM BKP1000 AID0002 Backup\Path Friday 10:56:02 PM BKP1000 AID0002 Backup\Path Thursday 10:56:02 PM BKP3003 AID0005 Backup\Path Sunday 10:56:02 PM BKP3003 AID0005 Backup\Path Monday 10:56:02 PM
Franklin I can't thank you enough for your looking into this. I did not think it would be as hard as this. I have been messing with this file for a while now (stripping useless information and removing characters shell doesn't like).Last edited by Korelis; 10-10-2007 at 05:49 PM. Reason: Cleaning up grammer.
- 10-10-2007 #8Just Joined!
- Join Date
- Oct 2007
- Location
- Houston
- Posts
- 75
I think I figured out why its not working. The lines are not in fact unique because of the time stamp. I think there is a way with awk I can remove those characters after the day. I am having trouble finding that command though.
- 10-10-2007 #9Linux Enthusiast
- Join Date
- Aug 2006
- Posts
- 631
If the repeated lines are in the file test why do have the following line in it?
It's is not repeated in the status file.Code:BKP3003 AID0005 Backup\Path Sunday 10:56:02 PM
Must the numbers be ignored in the first 2 columns?
You can print the first 4 columns without the time with awk as follow:
RegardsCode:awk 'print $1, $2, $3, $4' file
- 10-10-2007 #10Just Joined!
- Join Date
- Oct 2007
- Location
- Houston
- Posts
- 75
Hello Franklin,
I think what happened is the script worked like it was suppose to but since the time area contains seconds and the seconds differ on some of the lines it refused to print them.
You are correct about the line. My Copy Paste didn't work the way I wanted it to. It should be just like you said if it repeats 2 times in the status file then write 1 line of it in the test file. Each BKP and AID information on that line is going to contain different paths. Since the path is changing it may yeild a different day and time that those lines will be seen.
I hope I haven't confused you to much.


Reply With Quote
