Results 1 to 9 of 9
Hi, I have a script that performs a 5 period moving average on a data csv file with over 2 mil lines of data. The script works fine, but I ...
- 12-08-2011 #1Just Joined!
- Join Date
- Jun 2011
- Posts
- 16
bottle-necking?
Hi, I have a script that performs a 5 period moving average on a data csv file with over 2 mil lines of data. The script works fine, but I need it to go much faster. A friend of mine called my problem bottle-necking. How can I fix this?
- 12-08-2011 #2
With such a number of datasets, I would seriously consider a database.
Scratch that, I would insist on one.
Not only will you gain structure into the data, but also a way of performing arbitrary queries.
Plus, dbs are established, can be backuped, etc..
If you still want to work with csv:
It depends on what that script is doing and what kind of load it creates.
Is it IO, network and/or cpu bound?Last edited by Irithori; 12-08-2011 at 03:58 PM.
You must always face the curtain with a bow.
- 12-10-2011 #3Just Joined!
- Join Date
- Jun 2011
- Posts
- 16
here is the script itself..
PHP Code:z=1
wc=`cat ~/Forex/USDJPY1.csv | wc -l`
while [ $z -le $wc ]
do
x=$z
y=$(($x + 4))
comp=0
while [ $x -le $y ]
do
close=`cat ~/Forex/USDJPY1.csv | head -$x | tail -1 | cut -d "," -f 6 | sed -e "s/\.//g"`
comp=`echo "$comp + $close" | bc`
diff=`echo "$x - $y" | bc`
if [ $diff -eq 0 ]
then
echo "$comp / 5" | bc
fi
x=$(($x + 1))
done
z=$(($z + 1))
done
- 12-11-2011 #4Just Joined!
- Join Date
- Jun 2011
- Posts
- 16
bumpage le bumpster
- 12-11-2011 #5
Patience is a virtue

Dont take it the wrong way, but your script is a trainwreck.
Apart from multiple style violations, hardcodings and needless call of cat (to name just a few),
it has a serious flaw: For the mentioned 2 million line csv file, it will read that csv file 2*5 = 10 MILLION times.
No wonder it´s slow.
Also I didnt figure out, what you intend with the removing of dots ( sed -e "s/\.//g" ),
but this might be due to the lack of a provided example of the data you want to process.
Anyway, the following script is a reimplementation and may be a base for your further development
- It will silently skip all lines, that dont match the regex. This might be a point for improvements, aka issue a warning here.
- It has some basic errorchecking, but there is room for more
- It needs to be called with the csv file as argument
- It reads the csv file only once.
Hmm interesting: The purpose seems to be financial data processing.Code:#!/usr/bin/env bash ## Initialize average_over=5 delimiter="," field=6 declare -a collector if [ ! -s $1 ]; then echo "Error: Input csv file not found or empty" exit 1 fi input_csv=$1 ## Main echo "Moving average over $average_over lines:" for linevalue in $(cut -d "$delimiter" -f $field $input_csv | grep -E '^([0-9]*(\.[0-9]*)?|(0*)?\.[0-9]*)$' ); do # add to array until $average_over is reached, else pop+shift array if [ ${#collector[@]} -lt $average_over ]; then collector=("${collector[@]}" "$linevalue") else unset collector[0] collector=("${collector[@]}" "$linevalue") fi if [ ${#collector[@]} -eq $average_over ]; then sum=0 array_max=$(expr ${#collector[@]} - 1) for (( i=0; i<=$array_max; i++ )); do sum=$(echo "$sum + ${collector[$i]}" | bc) done echo "scale=4; $sum / ${#collector[@]}" | bc fi done
Probably some extra care is needed.Last edited by Irithori; 12-11-2011 at 11:17 PM.
You must always face the curtain with a bow.
- 12-11-2011 #6Just Joined!
- Join Date
- Jun 2011
- Posts
- 16
the script you gave me has an error on line 19, it says it's expecting "fi" but you have an else following the line then followed by the fi. i don't get it. also, your script will keep me busy for a while until i figure out what everything means once i get it working. i can give you a sample of a few lines in the csv file. It's historical data for forex data, with the time, open, high, low, close, and volume. the close is the 6th deliminated field. i tried using sh "your script" and ./"your script" i'm not sure how to input the arguement though. here's a sample of the data file:
i removed the "." from the number because echo "num / 1" didn't include the decimal. i figure i'll just add the decimal after the calculations take placePHP Code:2005.01.17,05:16,102.060,102.070,102.060,102.070,7
2005.01.17,05:17,102.060,102.060,102.060,102.060,4
2005.01.17,05:18,102.070,102.070,102.050,102.060,6
2005.01.17,05:19,102.050,102.060,102.050,102.060,5
2005.01.17,05:20,102.050,102.060,102.050,102.050,17
2005.01.17,05:21,102.050,102.060,102.050,102.060,5
2005.01.17,05:22,102.060,102.080,102.060,102.070,9
2005.01.17,05:23,102.070,102.070,102.070,102.070,2
2005.01.17,05:24,102.060,102.060,102.060,102.060,3
2005.01.17,05:25,102.050,102.070,102.050,102.060,10
2005.01.17,05:26,102.070,102.070,102.060,102.060,8
2005.01.17,05:27,102.060,102.060,102.050,102.050,5
2005.01.17,05:28,102.050,102.060,102.050,102.060,8
thank you for your help with this, i really appreciate it
- 12-12-2011 #7
Hmm,
in the morning and with some fresh coffee I see some points to improve on my version, but it does work:
Code:./movingaverage.sh testfile.csv Moving average over 5 lines: 102.0600 102.0580 102.0600 102.0620 102.0620 102.0640 102.0640 102.0600 102.0580
You must always face the curtain with a bow.
- 12-12-2011 #8Just Joined!
- Join Date
- Jun 2011
- Posts
- 16
and then nothing. it echoes the first line, but it doesn't process the data. i'm not sure what i did wrong, and now i feel like a nuisance. are you sure there's nothing more to it?PHP Code:sudo chmod 777 ~/improved.sh
./improved.sh ~/USDJPY1.csv
Moving average over 5 lines:
- 12-12-2011 #9
Check permissions/users and/or insert some echos, maybe a
echo $linevalue
as first command in the for loop.
But yes, the script actually does work. Copy&paste error?You must always face the curtain with a bow.


Reply With Quote