Find the answer to your Linux question:
Results 1 to 2 of 2
I have been stuck on this for hours, I feel like its not that hard but I'm frustrated. Please bare with me if this is a stupid question, I've googled ...
  1. #1
    Just Joined!
    Join Date
    Jan 2007
    Posts
    9

    bash script to wrap csv field in quotes

    I have been stuck on this for hours, I feel like its not that hard but I'm frustrated. Please bare with me if this is a stupid question, I've googled and googled and haven't found help I can understand.

    I have csv file whos lines are 3 fields separated by commas:
    023.1,false,Two Words or many more

    If the 3rd field contains a comma, the field needs to be wrapped in quotes so
    023.1,false,Stuff, more stuff
    should be
    023.1,false,"Stuff, more stuff"

    I've been focusing on sed and kind of hate it at this point. I think this is the closest I've gotten:
    sed 's/^*,^*,^[^,]/"&"/'
    but, it doesnt work and I don't know if I'm even going down the right path.


    Anyone have suggestions for me?

    Thanks!

  2. #2
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    So your regular expressions are a little messed up.

    Outside of a character class (the [...] bits), the '^' character refers to the beginning of a line. Therefore, you will only use this character once in a given regex (for very advanced regexes, this is not strictly true, but I have never in 5 years of programming Perl had to use such a regex, so I doubt you will either). So your regex is simply malformatted.

    Let's try this instead:
    Code:
    echo "0.231,false,some words, some more" | sed -re 's/^([^,]+),([^,]+),(.+)$/"\1","\2","\3"/'
    The regex here is kind of complicated. First of all, note the "-r" flag to sed. This enables extended regular expressions, which allow a number of extra features.

    The regular expression that I used is:
    Code:
    s/^([^,]+),([^,]+),(.+)$/"\1","\2","\3"/
    This means to substitute any match of the left side with the right side. So what am I matching on the left side?

    ^ = the beginning of the line
    ([^,]+) = a sequence of one or more non-commas. REMEMBER THIS SEQUENCE (this is called a match, and is indicated by parentheses).
    (.+) = a sequence of one or more of any character (including commas)

    On the right side, I have a fairly simple expression:

    \1, \2, \3 = The first, second, or third match from the left side, respectively

    This should make sense now. The left side looks for three fields separated by commas, where the first two cannot have any commas. This means that the first two commas MUST be field separators, while any further commas are considered a part of the third field.

    These three fields are remembered. On the right side, I replace this entire line with three quoted segments separated by commas.

    Does this make sense? Run it with a few lines and see if it works for you, and please ask if you don't understand some part of it.
    DISTRO=Arch
    Registered Linux User #388732

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...