Find the answer to your Linux question:
Results 1 to 3 of 3
Hi all At work I occasionally need to do some text processing, and it would be very useful if I could identify a way of doing the following: where there ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Linux Guru fingal's Avatar
    Join Date
    Jul 2003
    Location
    Birmingham - UK
    Posts
    1,539

    Method for comparing files: similarities


    Hi all

    At work I occasionally need to do some text processing, and it would be very useful if I could identify a way of doing the following:

    where there are 2 or more text files containing information on single lines, find a way of comparing the files and determining which lines are the same. The lines which are the same should be output in a useable format.

    That's the best way I can find to phrase my question. The need arose at work to do this, and we kludged it using Excel. It was time consuming and ugly.

    We don't use Linux at work, but I keep making a case. I've tried using the comm command (written by a Mr. Stallman no less!) but it didn't work very well.

    Any pointers to a workable solution gratefully received. Google hasn't pointed me at much. There are plenty of utilities for finding out how files differ, but not much for finding out common text between files.

    I vaguely thought about using sed, but I have no experience or skill with it.
    I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso

  2. #2
    tpl
    tpl is offline
    Linux User
    Join Date
    Jan 2007
    Location
    cleveland
    Posts
    476
    let me see if I understand: you have 2 files
    A and B, that may share entire lines in common:

    A:
    12345
    abcde
    B:
    12345
    fghij

    you want to put out the common line(s) 12345.

    if that's right, here's one way:

    cat A B | sort | uniq -d

    the -d option to uniq puts out items that are not unique

  3. #3
    Linux Guru fingal's Avatar
    Join Date
    Jul 2003
    Location
    Birmingham - UK
    Posts
    1,539
    Quote Originally Posted by tpl
    let me see if I understand: you have 2 files
    A and B, that may share entire lines in common:

    A:
    12345
    abcde
    B:
    12345
    fghij

    you want to put out the common line(s) 12345.

    if that's right, here's one way:

    cat A B | sort | uniq -d

    the -d option to uniq puts out items that are not unique
    Hello ... your understanding is correct! Well done ... I'll try that this evening. Thank you very much indeed.

    If that works (and assuming I need to do this again from time to time) I can 'sell' the idea of having some kind of *nix solution to my boss.
    I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •