Results 1 to 2 of 2
I have been stuck on this for hours, I feel like its not that hard but I'm frustrated. Please bare with me if this is a stupid question, I've googled ...
- 01-14-2010 #1Just Joined!
- Join Date
- Jan 2007
- Posts
- 9
bash script to wrap csv field in quotes
I have been stuck on this for hours, I feel like its not that hard but I'm frustrated. Please bare with me if this is a stupid question, I've googled and googled and haven't found help I can understand.
I have csv file whos lines are 3 fields separated by commas:
023.1,false,Two Words or many more
If the 3rd field contains a comma, the field needs to be wrapped in quotes so
023.1,false,Stuff, more stuff
should be
023.1,false,"Stuff, more stuff"
I've been focusing on sed and kind of hate it at this point. I think this is the closest I've gotten:
sed 's/^*,^*,^[^,]/"&"/'
but, it doesnt work and I don't know if I'm even going down the right path.
Anyone have suggestions for me?
Thanks!
- 01-15-2010 #2
So your regular expressions are a little messed up.
Outside of a character class (the [...] bits), the '^' character refers to the beginning of a line. Therefore, you will only use this character once in a given regex (for very advanced regexes, this is not strictly true, but I have never in 5 years of programming Perl had to use such a regex, so I doubt you will either). So your regex is simply malformatted.
Let's try this instead:
The regex here is kind of complicated. First of all, note the "-r" flag to sed. This enables extended regular expressions, which allow a number of extra features.Code:echo "0.231,false,some words, some more" | sed -re 's/^([^,]+),([^,]+),(.+)$/"\1","\2","\3"/'
The regular expression that I used is:
This means to substitute any match of the left side with the right side. So what am I matching on the left side?Code:s/^([^,]+),([^,]+),(.+)$/"\1","\2","\3"/
^ = the beginning of the line
([^,]+) = a sequence of one or more non-commas. REMEMBER THIS SEQUENCE (this is called a match, and is indicated by parentheses).
(.+) = a sequence of one or more of any character (including commas)
On the right side, I have a fairly simple expression:
\1, \2, \3 = The first, second, or third match from the left side, respectively
This should make sense now. The left side looks for three fields separated by commas, where the first two cannot have any commas. This means that the first two commas MUST be field separators, while any further commas are considered a part of the third field.
These three fields are remembered. On the right side, I replace this entire line with three quoted segments separated by commas.
Does this make sense? Run it with a few lines and see if it works for you, and please ask if you don't understand some part of it.DISTRO=Arch
Registered Linux User #388732


Reply With Quote