Find the answer to your Linux question:
Results 1 to 7 of 7
I have a file which contains 5 columns of demographic data with column headings. The table has 5 rows of data all stored in a text file. The second column ...
  1. #1
    Just Joined!
    Join Date
    Mar 2011
    Posts
    5

    gawk command

    I have a file which contains 5 columns of demographic data with column headings. The table has 5 rows of data all stored in a text file. The second column contains numerical data, however it has imbedded commas; i.e. 123,456,789. I would like to use awk or gawk to output the total of all of the numbers. I know how to set it up for another column which has no commas in the data and get the correct result.

    If I only have a single column with the data in it I was able to get the desired result with this command:

    gawk --posix -F, '/^,/ (tot+=$1 $2 $3); END{print tot}' file.txt

    The computer echo'd each row and then printed the total at the bottom. My challenge is how do I get it to ignore the first column and use the second column in the original file.

    Can anyone help with this problem? Thanks.

    Larry

  2. #2
    tpl
    tpl is offline
    Linux User
    Join Date
    Jan 2007
    Location
    cleveland
    Posts
    452
    have you tried using "cut" to extract the second column from "file.txt"
    then piping the result to "gawk" Something like this

    cut -d, -f2,3,4 <file.txt | gawk....
    the sun is new every day (heraclitus)

  3. #3
    Just Joined!
    Join Date
    Mar 2011
    Posts
    5

    GAWK command

    Thanks. I'll try it later tonight and get back with you. Quick question - will it permanently delete the first column or only for this command?

  4. #4
    Just Joined!
    Join Date
    Mar 2011
    Posts
    5

    GAWK Command

    I tried this:

    cut -d, -f2,3,4 file.txt|gawk ....

    and the out just removed the first column and the digits up to the first comma.

    I played around with the cut command replacing the -d, with a TAB and it did delete the first column but I got 0 as the total. Ran out of time to try some other options. Will look at it this weekend. Thanks for the suggestion.

  5. #5
    tpl
    tpl is offline
    Linux User
    Join Date
    Jan 2007
    Location
    cleveland
    Posts
    452
    please post a sample of the troublesome table
    the sun is new every day (heraclitus)

  6. #6
    Just Joined!
    Join Date
    Mar 2011
    Posts
    5

    GAWK Command

    This is the table I was working with:

    Countries Area (sq-km) Population Life_Expectancy
    Afghanistan 647,500 28513677 42.46
    Cambodia 181,040 13363421 58.41
    Canada 9,984,670 32507874 79.96
    Mexico 1,972,550 104959594 74.94
    US 9,631,418 293027571 77.43

    In the original table the data fields were seperated by TABS.
    This was for a school lab project where we had to output:

    1. The total population for all the countries
    2. The average Life Expectancy for all the countries.

    Number 1 and 2 were fairly simple. The instructor gave us a challenge which was:

    3. The total area for all the countries.

    That is what I was having a problem with.

  7. #7
    tpl
    tpl is offline
    Linux User
    Join Date
    Jan 2007
    Location
    cleveland
    Posts
    452
    we're really not supposed to do homework

    if the input file is called "A"--and we neglect the column headings, or use
    "tail" to remove them--then something like this should work OK:

    cut -f2 A | sed 's/,//g' | awk '{printf "%7d\n",$0}' |awk '{a+=$1}END{print a}'

    the first call to "awk" uses "printf" to right-align the digits.
    the sun is new every day (heraclitus)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •