Find the answer to your Linux question:
Results 1 to 4 of 4
I need to combine several files together. All the files have different fields of information but they all share a common field. How can I create a script that will ...
  1. #1
    Just Joined!
    Join Date
    Jun 2010
    Posts
    5

    Exclamation Field Day

    I need to combine several files together. All the files have different fields of information but they all share a common field. How can I create a script that will combine them? I've tried awk,grep,sed and can't seem to figure it out.

    File 1: Field1 Field2
    File 2: Field1 Field3 Field4
    File 3: Field1 Field5

    Final File: Field1 Field2 Field3 Field4 Field5 separated by Tabs.

    Files 1 and 2 have similar columns both vertically and horizontally however the layout of File 3 is different. The lines of File 3 do not match Files 1 or 2 but the fields are similar and also contain information that is not in either file.

    ie
    File 1

    abcd 1234
    efgh 5678
    ijkl 7890

    File 2

    abcd zyxw lkjh
    efgh vuts poiu
    ijkl rqpo mnbv

    File 3

    Title

    ijkl zxcv
    efgh qwer
    abcd asdf

    Final File

    abcd 1234 zyxw lkjh asdf
    efgh 5678 vuts poiu qwer
    ijkl 7890 rqpo mnbv zxcv


    NOTE: This is NOT homework related!

    Can anyone help me with this?

  2. #2
    Linux Newbie zenwalker's Avatar
    Join Date
    Feb 2010
    Location
    Inland Pacific NW
    Posts
    175
    PERL is your friend! It has a command, <cat>, that i find very useful in situations like combining HTML pages in sequence into one document. /www(dot)devdaily(dot)com/perl/edu/articles/pl010010/
    could prove useful to you, as well.

    (I cannot post the url, per se, so please bear with me until I am able to do so)

    Best wishes!

  3. #3
    Linux Engineer Freston's Avatar
    Join Date
    Mar 2007
    Location
    The Netherlands
    Posts
    1,047
    I agree that Perl would be better suited, but it's possible to do in Bash as well.

    The way to go with this is building arrays.

    The below code is untested, but structurally, you just first build an array of the one field that is common in all files. That is assuming it is a unique string.
    Then you grep through the other files to match this pattern, that gets rid of the problem that file3 has a different order. And after grepping, just awk only the fields you need.

    As you can see, output is stored in a new array. And when output is produced, it's filtered through sed to change spaces into tabs.

    Code:
    #!/bin/bash
    
    # Variables:
    OUTPUTFILE="file4"
    ARRAY=( `cat file1 | awk '{print $1}'` )
    
    
    # Build fields:
    for i in ${ARRAY[*]} ; do
    
        # Get fields 1 and 2 from file1
        OUTPUT[0]=`			cat	file1		|\
    				grep	$i		|\
    				awk	'{print $1,$2}'	`
    
    
        # Get fields 2 and 3 from file2
        OUTPUT[1]=`			cat	file2		|\
    				grep	$i		|\
    				awk	'{print $2,$3}'	`
    
    
        # Get field 2 from file3
        OUTPUT[2]=`			cat	file3		|\
    				grep	$i		|\
    				awk	'{print $2}'	`
    
    
        # Produce output:
        echo 	${OUTPUT[*]}				|\
    		sed 's/ /\t/g'				>>\
    		$OUTPUTFILE
    done
    Aw, now I almost finished it for you. That's not really helping, is it? I leave the testing and tuning to you
    Last edited by Freston; 06-17-2010 at 09:36 AM.
    Can't tell an OS by it's GUI

  4. #4
    drl
    drl is offline
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
    Posts
    1,117
    Hi.

    I would use sort and join, possibly aided by sed to get rid of the title and other such "non-relevant" data ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...