Results 1 to 4 of 4
I need to combine several files together. All the files have different fields of information but they all share a common field. How can I create a script that will ...
- 06-16-2010 #1Just Joined!
- Join Date
- Jun 2010
- Posts
- 5
Field Day
I need to combine several files together. All the files have different fields of information but they all share a common field. How can I create a script that will combine them? I've tried awk,grep,sed and can't seem to figure it out.
File 1: Field1 Field2
File 2: Field1 Field3 Field4
File 3: Field1 Field5
Final File: Field1 Field2 Field3 Field4 Field5 separated by Tabs.
Files 1 and 2 have similar columns both vertically and horizontally however the layout of File 3 is different. The lines of File 3 do not match Files 1 or 2 but the fields are similar and also contain information that is not in either file.
ie
File 1
abcd 1234
efgh 5678
ijkl 7890
File 2
abcd zyxw lkjh
efgh vuts poiu
ijkl rqpo mnbv
File 3
Title
ijkl zxcv
efgh qwer
abcd asdf
Final File
abcd 1234 zyxw lkjh asdf
efgh 5678 vuts poiu qwer
ijkl 7890 rqpo mnbv zxcv
NOTE: This is NOT homework related!
Can anyone help me with this?
- 06-16-2010 #2
PERL is your friend! It has a command, <cat>, that i find very useful in situations like combining HTML pages in sequence into one document. /www(dot)devdaily(dot)com/perl/edu/articles/pl010010/
could prove useful to you, as well.
(I cannot post the url, per se, so please bear with me until I am able to do so)
Best wishes!
- 06-17-2010 #3
I agree that Perl would be better suited, but it's possible to do in Bash as well.
The way to go with this is building arrays.
The below code is untested, but structurally, you just first build an array of the one field that is common in all files. That is assuming it is a unique string.
Then you grep through the other files to match this pattern, that gets rid of the problem that file3 has a different order. And after grepping, just awk only the fields you need.
As you can see, output is stored in a new array. And when output is produced, it's filtered through sed to change spaces into tabs.
Aw, now I almost finished it for you. That's not really helping, is it?Code:#!/bin/bash # Variables: OUTPUTFILE="file4" ARRAY=( `cat file1 | awk '{print $1}'` ) # Build fields: for i in ${ARRAY[*]} ; do # Get fields 1 and 2 from file1 OUTPUT[0]=` cat file1 |\ grep $i |\ awk '{print $1,$2}' ` # Get fields 2 and 3 from file2 OUTPUT[1]=` cat file2 |\ grep $i |\ awk '{print $2,$3}' ` # Get field 2 from file3 OUTPUT[2]=` cat file3 |\ grep $i |\ awk '{print $2}' ` # Produce output: echo ${OUTPUT[*]} |\ sed 's/ /\t/g' >>\ $OUTPUTFILE done
I leave the testing and tuning to you
Last edited by Freston; 06-17-2010 at 09:36 AM.
Can't tell an OS by it's GUI
- 06-17-2010 #4Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
I would use sort and join, possibly aided by sed to get rid of the title and other such "non-relevant" data ... cheers, drlWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )


Reply With Quote