Find the answer to your Linux question:
Results 1 to 8 of 8
I am trying to get away from using excel for something. I have some data that is in a consistent format that I need to consolidate multiple similar lines. Hard ...
  1. #1
    Just Joined!
    Join Date
    May 2008
    Posts
    8

    help do summing in csv with bash (or whatever)

    I am trying to get away from using excel for something.

    I have some data that is in a consistent format that I need to consolidate multiple similar lines. Hard to describe so I will put an example below:

    part, description, count, dollars, newcount, newdollars, diffcount, diffdollars
    tool1, this is tool1, 1, 5.00, 2, 10.00, 1, 5.00
    tool1, this is tool1, 3, 15.00, 2, 10.00, 1, 5.00
    tool2, this is tool2, 1, 2.00, 2, 4.00, 1, 2.00
    tool2, this is tool2, 5, 10.00, 2, 4.00, 3, 6.00
    tool3, this is tool1, 1, 5.00, 2, 10.00, 1, 5.00

    and I need to sum up each break in part number to get

    tool1, this is tool1, 4, 20.00, 4, 20.00, 0, 0.00
    tool2, this is tool2, 6, 12.00, 4, 8.00, 4, 8.00
    tool3, this is tool1, 1, 5.00, 2, 10.00, 1, 5.00

    I am pretty decent with bash but cant write any perl. any help would be awesome.

    also, the data is in a flat text file so I would think to use awk to identify the fields by position as either part of the script or do it before hand to convert it to csv or whatever is appropriate.

    thanks

  2. #2
    Linux Engineer GNU-Fan's Avatar
    Join Date
    Mar 2008
    Posts
    935
    Quote Originally Posted by syadnom View Post
    also, the data is in a flat text file so I would think to use awk to identify the fields by position as either part of the script or do it before hand to convert it to csv or whatever is appropriate.
    Right. awk is a pretty capable programming language and a single awk script will suffice for parsing, calculating and write-back.
    Debian GNU/Linux -- You know you want it.

  3. #3
    Linux Newbie
    Join Date
    Sep 2004
    Location
    UK
    Posts
    160
    Starting point (well it splits the fields up). Use another array to hold the previous result/summing.

    Code:
    #!/bin/bash
    line="tool1, this is tool1, 4, 20.00, 4, 20.00, 0, 0.00"
    
    tmp[0]="`echo ${line} | cut -d , -f 1`"
    tmp[1]="`echo ${line} | cut -d , -f 2`"
    fTmp="`echo ${line} | cut -d , -f 3- | tr ',' ' '`"
    
    echo ${fTmp} | while read  tmp[2] tmp[3] tmp[4] tmp[5] tmp[6] tmp[7]
    do
        echo ${tmp[0]}
        echo ${tmp[1]}
        echo ${tmp[2]}
        echo ${tmp[3]}
        echo ${tmp[4]}
        echo ${tmp[5]}
        echo ${tmp[6]}
        echo ${tmp[7]}
    done
    In a world without walls and fences, who needs Windows and Gates?

  4. #4
    Just Joined!
    Join Date
    May 2008
    Posts
    8

    .

    Blinky, I dont get it? this just seems to break down the data but not process it.

    GNU-Fan, I have never done any summing with awk, much less at a change in one of the fields. Can you help?

  5. #5
    Linux Newbie
    Join Date
    Sep 2004
    Location
    UK
    Posts
    160
    Code:
    #!/bin/bash
    
    tp[0]=""
    
    while read line
    do
       #line="tool1, this is tool1, 1, 5.00, 2, 10.00, 1, 5.00"
       a="`echo ${line} | cut -d , -f 1`"
       b="`echo ${line} | cut -d , -f 2`"
       fTmp=`echo ${line} | cut -d , -f 3- | tr ',' ' '`
    
       eval `echo ${fTmp} | while read  c d e f g h
       do
          echo c=$c
          echo d=$d
          echo e=$e
          echo f=$f
          echo g=$g
          echo h=$h
       done`
    
       if [ "${tp[0]}" == ${a} ] ; then
          #echo same
          tp[2]=$((tp[2]+c))
          tp[3]=$((tp[3]+d))
          tp[4]=$((tp[4]+e))
          tp[5]=$((tp[5]+f))
          tp[6]=$((tp[6]+g))
          tp[7]=$((tp[7]+h))
       else
          #echo nsame
          if [ -n "${tp[0]}" ] ; then
             echo "${tp[0]}, ${tp[1]}, ${tp[2]}, ${tp[3]}, ${tp[4]}, ${tp[5]}, ${tp[6]}, ${tp[7]}"
          fi
          tp[0]=${a}
          tp[1]=${b}
          tp[2]=${c}
          tp[3]=${d}
          tp[4]=${e}
          tp[5]=${f}
          tp[6]=${g}
          tp[7]=${h}
    
       fi
    done
    
    echo "${tp[0]}, ${tp[1]}, ${tp[2]}, ${tp[3]}, ${tp[4]}, ${tp[5]}, ${tp[6]}, ${tp[7]}"
    to run

    cat datafile | sort | script

    The only thing is it can only deal with integers (eg 5 rather than 5.00 etc)
    In a world without walls and fences, who needs Windows and Gates?

  6. #6
    Just Joined!
    Join Date
    May 2008
    Posts
    8

    wow

    ok, so I'm trying to decompile this and understand it. correct me if my assumptions are wrong.
    when setting the variables a and b, you are picking the items that will not be summed right?

    here is a sample of my actual data and the awk is use to get it:

    I grep it to clean out the useless lines then I use awk and fieldwidths because the built in field guesser doesnt get it right.

    cat file.txt | grep ^[a-z,A-Z,0-9] | grep -v -e ^Item -e ^COST\ COMP| awk 'BEGIN {FIELDWIDTHS="4 13 4 31 7 7 6 7 11 8 11 8 11 2";} {print $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14;}'
    now, the fields $1 through $8 are all just text fields while $9 - $14 are the ones I want to sum up.

    here is an example of the data:

    ANC 2X100 25 B2100 2X100 BRASS GAUGE 005501 EA 0 0.00 0 0.00 0 0.00
    ANC 2X200 25 B2200 2" 200# PRESSURE GAGE 151414 S42 EACH 1 4.98 0 0.00 1- 4.98
    ANC 2X200 25 B2200 2" 200# PRESSURE GAGE 165216 EACH 0 0.00 1 4.98 1 4.98
    ANC 2X3000 25 2" 3000# PRESSURE GAGE 005502 EACH 0 0.00 0 0.00 0 0.00
    ANC 2X30RL 25 B230RL 2X40 REDLINE BRASS GAUG 005503 EA 0 0.00 0 0.00 0 0.00
    ANC 2X400 25 B2400 2X400 BRASS GAUGE 005515 EA 0 0.00 0 0.00 0 0.00
    ANC 2X4-100 25 MP1 1.00 MAG LENS 005504 EACH 0 0.00 0 0.00 0 0.00
    ANC 2X4-10 25 2X4-1/4 SH 10 LENS 145804 CLS EACH 1 0.63 0 0.00 1- 0.63
    ANC 2X4-10 25 2X4-1/4 SH 10 LENS 169013 EACH 0 0.00 1 0.63 1 0.63
    ANC 2X4-11 25 2X4-1/4 SH 11 LENS 151919 S46 EACH 37 23.31 0 0.00 37- 23.31
    ANC 2X4-11 25 2X4-1/4 SH 11 LENS 165217 EACH 0 0.00 38 23.94 38 23.94
    though the data is alligned in rows but I cant get that exact format to display here. basically, anything after the EA and EACH fields is the $9 - $14 that I need to sup up.

    would I be correct in saying that I could set
    a=$1
    b=$2
    c=$3
    d=$4
    e=$5
    f=$6
    g=$7
    h= $8
    ?

    I dont know what to do here:

    fTmp=`echo ${line} | cut -d , -f 3- | tr ',' ' '`
    also, I'm guessing that instead if
    echo c=$c
    echo d=$d
    echo e=$e
    echo f=$f
    echo g=$g
    echo h=$h
    I would do
    echo i=$i
    echo j=$j
    echo k=$k
    echo l=$l
    echo m=$m
    echo n=$n
    im I on the right track or way off here?

  7. #7
    Linux Newbie
    Join Date
    Sep 2004
    Location
    UK
    Posts
    160
    The bash way won't work, it can only do integer maths.

    Also where you use awk to cleanup the data, output as csv (comma delimited)

    Code:
    ANC,2X100,25,B2100,2X100,BRASS GAUGE,005501,EA,0,0.00,0,0.00,0,0.00
    ANC,2X200,25,B2200,2",200# PRESSURE GAGE,151414,S42 EACH,1,4.98,0,0.00,1-,4.98
    ANC,2X200,25,B2200,2",200# PRESSURE GAGE,165216,EACH,0,0.00,1,4.98,1,4.98
    ANC,2X3000,25,2",3000#,PRESSURE GAGE,005502,EACH,0,0.00,0,0.00,0,0.00
    ANC,2X30RL,25,B230RL,2X40,REDLINE BRASS GAUG,005503,EA,0,0.00,0,0.00,0,0.00
    ANC,2X400,25,B2400,2X400,BRASS GAUGE,005515,EA,0,0.00,0,0.00,0,0.00
    ANC,2X4-100,25,MP1,1.00,MAG LENS,005504,EACH,0,0.00,0,0.00,0,0.00
    ANC,2X4-10,25,2X4-1/4,SH,10 LENS,145804,CLS EACH,1,0.63,0,0.00,1-,0.63
    ANC,2X4-10,25,2X4-1/4,SH,10 LENS,169013,EACH,0,0.00,1,0.63,1,0.63
    ANC,2X4-11,25,2X4-1/4,SH,11 LENS,151919,S46 EACH,37,23.31,0,0.00,37-,23.31
    ANC,2X4-11,25,2X4-1/4,SH,11 LENS,165217,EACH,0,0.00,38,23.94,38,23.94
    As I said the bash way won't work, so tour probably going have to use perl or something else. But you do need to get you data formatted correctly to make life easier (with respect to how fields are delimited etc)
    In a world without walls and fences, who needs Windows and Gates?

  8. #8
    Linux Newbie
    Join Date
    Sep 2004
    Location
    UK
    Posts
    160

    Thumbs up

    Code:
          tp[2]=`echo "${tp[2]} + ${c}" | bc`
          tp[3]=`echo "${tp[2]} + ${d}" | bc`
          tp[4]=`echo "${tp[2]} + ${e}" | bc`
          tp[5]=`echo "${tp[2]} + ${f}" | bc`
          tp[6]=`echo "${tp[2]} + ${g}" | bc`
          tp[7]=`echo "${tp[2]} + ${h}" | bc`
    should fix the limitation of integer on maths
    In a world without walls and fences, who needs Windows and Gates?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...