Find the answer to your Linux question:
Results 1 to 3 of 3
Hello gurus, If we have a csv input here Code: Apple_ID, FDA_ID,Date 567,3, “10/12/2008 12:00:00 AM” 567,1, “10/12/2007 12:00:00 AM” 348,1, “10/12/2008 12:00:00 AM” 567,2, “10/12/2009 12:00:00 AM” And we ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Aug 2012
    Posts
    24

    Awk scripts to read rows in a column with the same/differen value


    Hello gurus,

    If we have a csv input here

    Code:
    Apple_ID, FDA_ID,Date
    567,3, “10/12/2008  12:00:00 AM”
    567,1, “10/12/2007  12:00:00 AM”
    348,1, “10/12/2008  12:00:00 AM”
    567,2, “10/12/2009  12:00:00 AM”
    And we like to read the first column, if the Apple_ID is the same (567 for example) AND FDA_ID (1,2,3 for example) is different, print the row with the most recent Date (567,2, “10/12/2009 12:00:00 AM” in this case).

    In this case, the desire output would be:

    Code:
    Apple_ID, FDA_ID,Date
    348,1, “10/12/2008  12:00:00 AM”
    567,2, “10/12/2009  12:00:00 AM”
    We try to write some arrays using the examples in other posts but doesn’t seem to work.

    http://unix.stackexchange.com/questi...r-column-value

    Any hint could be quite helpful. Thanks!

  2. #2
    Just Joined! mrbruno's Avatar
    Join Date
    Jan 2013
    Location
    /MilkyWay/Sol/Earth/USA/NC/Raleigh
    Posts
    66
    Quote Originally Posted by psu4 View Post
    ... if the Apple_ID is the same (567 for example) AND FDA_ID (1,2,3 for example) is different, print the row with the most recent Date ...
    Questions:
    1. What do you want to happen if the Apple_ID and FDA_ID are both the same??
    2. It might also help to understand what style date you're using because it's ambiguous. Is the first date October 12th or December 10th??

  3. #3
    Just Joined! mrbruno's Avatar
    Join Date
    Jan 2013
    Location
    /MilkyWay/Sol/Earth/USA/NC/Raleigh
    Posts
    66
    Quote Originally Posted by psu4 View Post
    Hello gurus,

    If we have a csv input here

    ...

    Any hint could be quite helpful. Thanks!
    How about:

    Code:
    awk -F',' '
    function normalizeDate(input) {
      ret = input
      sub("  *", " ", input)
      tokcount = split(input, toks, "[/ :]")
      if (tokcount == 7) {
        month = toks[1]      # ???
        day = toks[2]        # ???
        year = toks[3]
        hour = toks[4]
        min = toks[5]
        sec = toks[6]
        suffix = toks[7]
        if ((suffix == "AM") && (hour == 12)) hour = 0
        else if ((suffix == "PM") && (hour != 12)) hour += 12
        ret = sprintf("%04d/%02d/%02d %02d:%02d:%02d", year, month, day, hour, min, sec)
      }
      return ret
    }
    
    {
      if (NR == 1) print $0
      else {
        appleID = $1
        fdaID = $2
        timestamp = $3
        sub("^ ", "", timestamp)
        gsub("\"", "", timestamp)
        key = appleID               # ???
        if (data[key] == "") {
          data[key] = appleID "," fdaID
          timestamps[key] = timestamp
        }
        else if (normalizeDate(timestamp) > normalizeDate(timestamps[key])) {
          data[key] = appleID "," fdaID
          timestamps[key] = timestamp
        }
      }
    }
    
    END {
      for (key in data)
        printf("%s,\"%s\"\n", data[key], timestamps[key])
    }' "$@"
    Double-check the statements with question marks. You may have to adjust them to match what you want to happen.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •