Find the answer to your Linux question:
Results 1 to 3 of 3
Hello, I often use grep to extract lines from a file using patterns that are in another file, e.g. grep -wf patternfile.txt infile.txt This toy example illustrates my problem: Infile: ...
  1. #1
    Just Joined!
    Join Date
    Sep 2008
    Location
    Copenhagen
    Posts
    5

    Grepping something out of a specific column in a file using pattern from another file

    Hello,
    I often use grep to extract lines from a file using patterns that are in another file, e.g.

    grep -wf patternfile.txt infile.txt

    This toy example illustrates my problem:

    Infile:
    A 100 3396 101 M
    A 200 3488 100 M
    B 100 3431 102 M
    A 100 3454 121 F
    C 200 3407 378 M
    A 200 3440 400 M


    Patternfile:
    100
    121
    378
    407


    Here, I want grep to only look for matches in column 4 of the infile. So, the output that I want is:

    A 200 3488 100 M
    A 100 3454 121 F
    C 200 3407 378 M


    However, since grep also finds some matches in column 2 of the infile, what I get is:
    A 100 3396 101 M
    A 200 3488 100 M
    B 100 3431 102 M
    A 100 3454 121 F
    C 200 3407 378 M


    In this case I can get around it by, e.g.
    awk '{print $3,$4,$5}' infile.txt | grep -f patternfile.txt > temp.txt
    grep -f temp.txt infile.txt


    However, I would much more like to be able to grep directly out of a specific column with a pattern from a file, as this workaround isn't always applicable.

    I'd appreciate any suggestions

  2. #2
    Linux Newbie radoulov's Avatar
    Join Date
    Sep 2007
    Posts
    111
    A few possibilities:

    1. Modify your pattern file, use the following format:

    Code:
    ([^ ]+ ){3}100
    ([^ ]+ ){3}121
    ([^ ]+ ){3}378
    ([^ ]+ ){3}407
    Then use:

    Code:
    grep -Ef patternfile infile
    2. Construct the pattern with a process subsitution (if your shell supports it):

    Code:
    &#37; grep -Ef <(printf "([^ ]+ ){3}%s\n" $(<patternfile)) infile
    A 200 3488 100 M
    A 100 3454 121 F
    C 200 3407 378 M
    3. Use a more powerful tool:

    Code:
    % awk 'NR==FNR{_[$1];next}$4 in _' patternfile infile 
    A 200 3488 100 M
    A 100 3454 121 F
    C 200 3407 378 M

  3. #3
    Just Joined!
    Join Date
    Sep 2008
    Location
    Copenhagen
    Posts
    5
    Thanks radoulov! Just what I needed. I had a hunch that awk could do the job - and I think that's the best of your solutions.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...