Find the answer to your Linux question:
Results 1 to 6 of 6
Hi all I have a file of about 1500 lines odd long, which is an export from a database online. It is essentially a CSV, but instead of comma's is ...
  1. #1
    Linux Guru sdousley's Avatar
    Join Date
    Feb 2004
    Posts
    1,789

    sed scripting

    Hi all

    I have a file of about 1500 lines odd long, which is an export from a database online. It is essentially a CSV, but instead of comma's is a semi-colon. What i want is the 15th instance of
    ".*";
    changing to
    "textfield:";
    .

    The problem is that the instances of ""; before the one i need to change are not all the same, and some have some text in, and some dont. So you cannot count the number of instances of "";, it will need to be something more like ".*"; (if using regexp). And will need to also consider that some of the fields have the html code for a " in it of
    "
    so needs to ideally look for the "; after each item.

    Anyone know how this can be done? I have had a look on a sed thing online and come up with:

    Code:
    sed 'startline,$ s/\"\"\;/\"textfield:\"\; /15'
    where "startline" is the first line i want the change to occur (dont want it on the first 30 odd lines).

    Anyone here know sed that well?
    "I am not an alcoholic, alcoholics go to meetings"
    Registered Linux user = #372327

  2. #2
    Linux Guru anomie's Avatar
    Join Date
    Mar 2005
    Location
    Texas
    Posts
    1,692
    • You have a file with a large number of rows, which contain fields delimited by a semi-colon -- ;
    • You want to find the 15th field in each row and manipulate the value.
    • You want to skip the first n number of rows in this operation.


    Is that all true?

  3. #3
    Linux Guru sdousley's Avatar
    Join Date
    Feb 2004
    Posts
    1,789
    If i understand what i want to do properly, then yes!!
    "I am not an alcoholic, alcoholics go to meetings"
    Registered Linux user = #372327

  4. #4
    Linux Guru anomie's Avatar
    Join Date
    Mar 2005
    Location
    Texas
    Posts
    1,692
    I'll show you an example with simplified data.

    Code:
    [aqua@troy ~]$ cat my-data 
    01;field-a;field-b;field-c;field-d
    02;field-a;field-b;field-c;field-d
    03;field-a;field-b;field-c;field-d
    04;field-a;field-b;field-c;field-d
    05;field-a;field-b;field-c;field-d
    06;field-a;field-b;field-c;field-d
    07;field-a;field-b;field-c;field-d
    08;field-a;field-b;field-c;field-d
    09;field-a;field-b;field-c;field-d
    10;field-a;field-b;field-c;field-d
    
    [aqua@troy ~]$ awk -F';' '{ if(NR < 7) next ; sub(/.*/,"new-text",$4) ; print }' my-data 
    07 field-a field-b new-text field-d
    08 field-a field-b new-text field-d
    09 field-a field-b new-text field-d
    10 field-a field-b new-text field-d
    Quick explanation of that awk statement:
    1. The -F option is used to specify a semicolon / ; delimiter.
    2. Any data rows prior to the 7th are skipped.
    3. After that time, the 4th field in each row is replaced with 'new-text' and printed to stdout.

    You can pipe the results to the less pager for viewing. Or you can redirect the results to an output file.

    Why awk? No good reason. It just works.

    (p.s. Let me know if you need semicolons in the output too -- there's another step required for that.)

  5. #5
    Linux Enthusiast
    Join Date
    Aug 2006
    Posts
    631
    If the csv file is seperated by a semicolon, the fields are within double quotes and you want to skip the first 30 lines:

    Code:
    awk -F ";" '{if(NR<=30)next;$15="\42textfield:\42"}{gsub(/ /,";")}{print}' file
    Regards

  6. #6
    Linux Guru sdousley's Avatar
    Join Date
    Feb 2004
    Posts
    1,789
    I have actually resolved this now with a sed command from someone else:

    Code:
    sed -e 's/\(\("[^"]*";\)\{15\}\)"[^"]*";\(\("[^"]*";\)*\)/\1"Textfield:";\3/g' < csv_export_article_818967.csv > updated.csv
    That outputs updated.csv with the relevent output. I had to remove the header lines manually, but i think simply adding in "30,$" before the s/... would have also done the trick, so:

    Code:
    sed -e '30,$ s/\(\("[^"]*";\)\{15\}\)"[^"]*";\(\("[^"]*";\)*\)/\1"Textfield:";\3/g' < csv_export_article_818967.csv > updated.csv
    Though i have not tested this theory. Either way, this ran in no time at all (0.9 seconds) and saved my friend about a months worth of work updating the site manually (Which the provider said was the ONLY way to do it!) Dont you just love FOSS!!!
    "I am not an alcoholic, alcoholics go to meetings"
    Registered Linux user = #372327

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...