Find the answer to your Linux question:
Results 1 to 9 of 9

Thread: bottle-necking?

Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1

    bottle-necking?


    Hi, I have a script that performs a 5 period moving average on a data csv file with over 2 mil lines of data. The script works fine, but I need it to go much faster. A friend of mine called my problem bottle-necking. How can I fix this?

  2. #2
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,760
    With such a number of datasets, I would seriously consider a database.
    Scratch that, I would insist on one.

    Not only will you gain structure into the data, but also a way of performing arbitrary queries.
    Plus, dbs are established, can be backuped, etc..


    If you still want to work with csv:
    It depends on what that script is doing and what kind of load it creates.
    Is it IO, network and/or cpu bound?
    Last edited by Irithori; 12-08-2011 at 04:58 PM.
    You must always face the curtain with a bow.

  3. #3
    here is the script itself..

    PHP Code:
    z=1
    wc
    =`cat ~/Forex/USDJPY1.csv | wc -l`
    while [ 
    $z -le $wc ]
    do
    x=$z
    y
    =$(($x 4))
    comp=0
    while [ $x -le $y ]
    do
    close=`cat ~/Forex/USDJPY1.csv | head -$x | tail -1 | cut -d "," -f 6 | sed -e "s/\.//g"`
    comp=`echo "$comp + $close" | bc`
    diff=`echo "$x - $y" | bc`
    if [ 
    $diff -eq 0 ]
    then
    echo "$comp / 5" bc
    fi
    x
    =$(($x 1))
    done
    z
    =$(($z 1))
    done 

  4. $spacer_open
    $spacer_close
  5. #4
    bumpage le bumpster

  6. #5
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,760
    Patience is a virtue

    Dont take it the wrong way, but your script is a trainwreck.

    Apart from multiple style violations, hardcodings and needless call of cat (to name just a few),
    it has a serious flaw: For the mentioned 2 million line csv file, it will read that csv file 2*5 = 10 MILLION times.
    No wonder itīs slow.


    Also I didnt figure out, what you intend with the removing of dots ( sed -e "s/\.//g" ),
    but this might be due to the lack of a provided example of the data you want to process.


    Anyway, the following script is a reimplementation and may be a base for your further development
    - It will silently skip all lines, that dont match the regex. This might be a point for improvements, aka issue a warning here.
    - It has some basic errorchecking, but there is room for more
    - It needs to be called with the csv file as argument
    - It reads the csv file only once.

    Code:
    #!/usr/bin/env bash
    
    ## Initialize
    average_over=5
    delimiter=","
    field=6
    declare -a collector
    if [ ! -s $1 ]; then
      echo "Error: Input csv file not found or empty"
      exit 1
    fi
    input_csv=$1
    
    ## Main
    echo "Moving average over $average_over lines:"
    for linevalue in $(cut -d "$delimiter" -f $field $input_csv | grep -E '^([0-9]*(\.[0-9]*)?|(0*)?\.[0-9]*)$' ); do
      # add to array until $average_over is reached, else pop+shift array
      if [ ${#collector[@]} -lt $average_over ]; then
        collector=("${collector[@]}" "$linevalue")
      else
        unset collector[0]
        collector=("${collector[@]}" "$linevalue")
      fi
    
      if [ ${#collector[@]} -eq $average_over ]; then
        sum=0
        array_max=$(expr ${#collector[@]} - 1)
        for (( i=0; i<=$array_max; i++ )); do
          sum=$(echo "$sum + ${collector[$i]}" | bc)
        done
        echo "scale=4; $sum / ${#collector[@]}" | bc
      fi
    done
    Hmm interesting: The purpose seems to be financial data processing.
    Probably some extra care is needed.
    Last edited by Irithori; 12-12-2011 at 12:17 AM.
    You must always face the curtain with a bow.

  7. #6
    the script you gave me has an error on line 19, it says it's expecting "fi" but you have an else following the line then followed by the fi. i don't get it. also, your script will keep me busy for a while until i figure out what everything means once i get it working. i can give you a sample of a few lines in the csv file. It's historical data for forex data, with the time, open, high, low, close, and volume. the close is the 6th deliminated field. i tried using sh "your script" and ./"your script" i'm not sure how to input the arguement though. here's a sample of the data file:

    PHP Code:
    2005.01.17,05:16,102.060,102.070,102.060,102.070,7
    2005.01.17
    ,05:17,102.060,102.060,102.060,102.060,4
    2005.01.17
    ,05:18,102.070,102.070,102.050,102.060,6
    2005.01.17
    ,05:19,102.050,102.060,102.050,102.060,5
    2005.01.17
    ,05:20,102.050,102.060,102.050,102.050,17
    2005.01.17
    ,05:21,102.050,102.060,102.050,102.060,5
    2005.01.17
    ,05:22,102.060,102.080,102.060,102.070,9
    2005.01.17
    ,05:23,102.070,102.070,102.070,102.070,2
    2005.01.17
    ,05:24,102.060,102.060,102.060,102.060,3
    2005.01.17
    ,05:25,102.050,102.070,102.050,102.060,10
    2005.01.17
    ,05:26,102.070,102.070,102.060,102.060,8
    2005.01.17
    ,05:27,102.060,102.060,102.050,102.050,5
    2005.01.17
    ,05:28,102.050,102.060,102.050,102.060,
    i removed the "." from the number because echo "num / 1" didn't include the decimal. i figure i'll just add the decimal after the calculations take place

    thank you for your help with this, i really appreciate it

  8. #7
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,760
    Hmm,
    in the morning and with some fresh coffee I see some points to improve on my version, but it does work:
    Code:
    ./movingaverage.sh testfile.csv 
    Moving average over 5 lines:
    102.0600
    102.0580
    102.0600
    102.0620
    102.0620
    102.0640
    102.0640
    102.0600
    102.0580
    You must always face the curtain with a bow.

  9. #8
    PHP Code:
    sudo chmod 777 ~/improved.sh
    ./improved.sh ~/USDJPY1.csv
    Moving average over 5 lines

    and then nothing. it echoes the first line, but it doesn't process the data. i'm not sure what i did wrong, and now i feel like a nuisance. are you sure there's nothing more to it?

  10. #9
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,760
    Check permissions/users and/or insert some echos, maybe a
    echo $linevalue
    as first command in the for loop.

    But yes, the script actually does work. Copy&paste error?
    You must always face the curtain with a bow.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •