Find the answer to your Linux question:
Results 1 to 3 of 3
Hi, I have a large CSV file, tab-delimited with CRLF at the end of each line. Each line should contain 5 fields (i.e. NF == 5) However, there are rogue ...
  1. #1
    Just Joined!
    Join Date
    Jul 2010
    Posts
    2

    [SOLVED] AWK: Join lines if NF is wrong

    Hi,

    I have a large CSV file, tab-delimited with CRLF at the end of each line.

    Each line should contain 5 fields (i.e. NF == 5) However, there are rogue CRLF characters in the middle of some records, causing records to be split across two lines.

    I want to scan each line, check the field count and if it's !=5 then join that line to the following line.

    Example input might be;

    Code:
    one two three four five
    six seven eight nine ten
    eleven tw<CRLF>
    elve thirteen fourteen fifteen
    sixteen seventeen eighteen nineteen twenty
    In the example, I want to merge lines 3 and 4 to read;

    Code:
    eleven twelve thirteen fourteen fifteen
    My attempt at the moment is;

    Code:
    awk 'BEGIN { FS = "\t" } ; { if (NF != 5) {saved=$0;next} {print saved,$0} }'
    This doesn't work, I'm getting duplicate records inserted. Any help is appreciated (or suggestions on an easier way to do this using awk, sed, or perl)

  2. #2
    Linux Newbie
    Join Date
    Apr 2007
    Posts
    119
    I would imagine that you need to also remove the truncated line after you save it.

  3. #3
    Just Joined!
    Join Date
    Jul 2010
    Posts
    2
    Thanks for the reply. Here's my solution. I ended up using getline instead of 'next'

    Code:
    awk 'BEGIN { FS = "\t" } ; { if (NF != 5) {saved=$0;getline;print saved$0} else {print $0} }'

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...