Find the answer to your Linux question:
Results 1 to 5 of 5
I have two files : a1.txt and a2.txt. a2.txt has 5 columns of numbers. I want to take each entry in a particular column and search for that in a ...
  1. #1
    Just Joined!
    Join Date
    Jul 2007
    Posts
    3

    problem with scripting

    I have two files : a1.txt and a2.txt.
    a2.txt has 5 columns of numbers.
    I want to take each entry in a particular column and search for that in a particular column in a1.txt. If it is present in the column in a2.txt, I want to print the entire row in which it is found.
    I have been trying this for a while but I have not reached anywhere.
    I know that the "awk" command can be used to extract columns out of a file.
    But I dont know how to use 2 files with awk.
    Any help will be highly appreciated.

    Thank You.

    Aj

  2. #2
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    I'm not sure I follow.

    You want to take, for instance, the second column of a2.txt, and for each number, if it exists in a1.txt, print out that entire row of a2.txt?

    You can use cut and grep to achieve this:
    Code:
    #!/bin/bash
    
    exec 3< a2.txt
    
    while read line <&3; do
        num=$(echo "$line" | cut -f2)
        if grep -q "\\<$num\\>" a1.txt; then
            echo "$line"
        fi
    done
    We use cut to get the second column in each line of a2.txt (if the columns are separated by something other than a tab, then check cut's man page for how to set the delimiter). We then use grep to check if that number exists in a1.txt (the \<...\> means that we want to match a "word", so if we're looking for 2, 82 won't match). If grep finds it, then we print the line.

    Does this all make sense?
    DISTRO=Arch
    Registered Linux User #388732

  3. #3
    Just Joined!
    Join Date
    Jul 2007
    Posts
    3

    problem with scripting

    Thanks for the reply Cabhan.
    You have understood almost correctly except that I want to "print out that entire row of a1.txt" instead of a2.txt

    I want to take, for instance, the second column of a2.txt, and for each number, if it exists in a1.txt, print out that entire row of **a1.txt**

    I didnt understand " exec 3" ?
    what is 3 in that? and what is exec?

    Thanks!
    Aj

  4. #4
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    Aha. Gotcha.

    Well first off, the exec 3 thing. This is something that few people really know about. This allows me to open a file and read it easily with Bash. Are you familiar with I/O redirection? That's the "echo hello > tempfile" thing, where we change where output goes or input comes from. Well, we start off with 3 I/O streams by default:
    0 - standard input (stdin)
    1 - standard output (stdout)
    2 - standard error (stderr)
    This is why you need to say "2> error_log" if you want to redirect errors and warnings.

    In this case, I opened a new input stream (note the '<'), and assigned it the number 3. I can now read through it easily in the loop. The 'exec' command, well, its man page explains it:
    Code:
    The  exec  utility  shall  open, close, and/or copy file descriptors as
    specified by any redirections as part of the command.
    Now, going back to your particular problem:

    This small change means that rather than using grep on the entire file, we grep line-by-line. This way, we know exactly which line the match was on.
    Code:
    #!/bin/bash
    
    exec 3< a2.txt
    
    while read line <&3; do
        num=$(echo "$line" | cut -f2)
        
        exec 4< a1.txt
        while read a1line <&4; do
            if echo "$a1line" | grep -wq "$num"; then
                echo "$a1line"
                break
            fi
        done
        exec 4<&- # this closes file descriptor 4, so we can be assured that it is reusable
    done
    Does this make sense?


    As a side note, I realized that in your original question, you asked about awk. awk can be used for doing operations on columns, and indeed, it's very good at it. However, awk is a very complicated program, and so for something as simple as this, cut is far more efficient.
    DISTRO=Arch
    Registered Linux User #388732

  5. #5
    Just Joined!
    Join Date
    Jul 2007
    Posts
    3

    problem with scripting

    Thanks a Lot!
    I tried it out and it is working!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...