Results 1 to 2 of 2
Hi, I'm looking for some help. I'm working with a friend on a fedora 10 server running primarily several fairly high traffic apache/php/mysql sites.
Recently a 2nd hard drive was ...
- 12-08-2009 #1Just Joined!
- Join Date
- Dec 2009
- Posts
- 1
please help: unexplained spike in load average & drive usage
Hi, I'm looking for some help. I'm working with a friend on a fedora 10 server running primarily several fairly high traffic apache/php/mysql sites.
Recently a 2nd hard drive was added for some additional storage (this problem may or may not be related to this addition)
Currently, we keep seeing big jumps in load average, without any obvious reason (ie: high cpu process, or something). What we have noticed, using iostat, is right before the jump in load average, both drives %util jump to 100%, and the await/svctm jump way up for several seconds. I haven't been able to track down what could cause this. It has lately been happening every couple minutes, usually giving the load average time to settle down (it will run around 1-2 if this is left alone), but at times if this happens several times in a row, the server can almost grind to a halt. There's lots of memory & cpu available, and we're not swapping.
Is there anything else I can look at, or possible causes? I've been trying to track it down for several days now with no luck.
Showing below, load averages and iostat output for several seconds where this took place.
Now.. I'm no expert at this type of thing, so be nice.
thanks very much for your time! any thoughts/comments/help would be greatly appreciated.
load average: 6.65, 6.63, 6.66 08:26:50
load average: 6.65, 6.63, 6.66 08:26:51
load average: 6.65, 6.63, 6.66 08:26:52
load average: 6.65, 6.63, 6.66 08:26:53
load average: 6.65, 6.63, 6.66 08:26:57
load average: 13.24, 7.99, 7.10 08:26:58
load average: 13.24, 7.99, 7.10 08:26:59
Code:Time: 08:26:50 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 23.00 0.00 26.00 0.00 912.00 0.00 35.08 0.17 6.65 4.19 10.90 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda3 23.00 0.00 26.00 0.00 912.00 0.00 35.08 0.17 6.65 4.19 10.90 sdb 0.00 60.00 1.00 2.00 8.00 496.00 168.00 0.33 110.33 110.33 33.10 sdb1 0.00 60.00 1.00 2.00 8.00 496.00 168.00 0.33 110.33 110.33 33.10 Time: 08:26:51 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 23.00 0.00 12.00 0.00 480.00 0.00 40.00 5.42 45.83 53.50 64.20 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda3 23.00 0.00 12.00 0.00 480.00 0.00 40.00 5.42 45.83 53.50 64.20 sdb 24.00 0.00 2.00 0.00 112.00 0.00 56.00 2.22 242.00 326.50 65.30 sdb1 24.00 0.00 2.00 0.00 112.00 0.00 56.00 2.22 242.00 326.50 65.30 Time: 08:26:52 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 26.00 0.00 1.00 0.00 136.00 0.00 136.00 18.73 912.00 1000.00 100.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda3 26.00 0.00 1.00 0.00 136.00 0.00 136.00 18.73 912.00 1000.00 100.00 sdb 0.00 0.00 1.00 0.00 88.00 0.00 88.00 4.59 686.00 1000.00 100.00 sdb1 0.00 0.00 1.00 0.00 88.00 0.00 88.00 4.59 686.00 1000.00 100.00 Time: 08:26:53 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 2.00 0.00 112.00 0.00 56.00 29.34 2084.00 500.00 100.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda3 0.00 0.00 2.00 0.00 112.00 0.00 56.00 29.34 2084.00 500.00 100.00 sdb 0.00 0.00 2.00 0.00 56.00 0.00 28.00 4.06 1927.00 500.00 100.00 sdb1 0.00 0.00 2.00 0.00 56.00 0.00 28.00 4.06 1927.00 500.00 100.00 Time: 08:26:54 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 1.00 0.00 8.00 0.00 8.00 32.12 2719.00 1001.00 100.10 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda3 0.00 0.00 1.00 0.00 8.00 0.00 8.00 32.12 2719.00 1001.00 100.10 sdb 12.00 0.00 1.00 0.00 8.00 0.00 8.00 2.61 2633.00 1000.00 100.00 sdb1 12.00 0.00 1.00 0.00 8.00 0.00 8.00 2.61 2633.00 1000.00 100.00 Time: 08:26:58 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 9.18 113.61 74.37 22.78 1746.84 1091.14 29.21 24.62 505.05 10.17 98.77 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda3 9.18 113.61 74.37 22.78 1746.84 1091.14 29.21 24.62 505.05 10.17 98.77 sdb 7.59 6.33 6.65 0.63 326.58 55.70 52.52 2.58 608.30 94.65 68.89 sdb1 7.59 6.33 6.65 0.63 326.58 55.70 52.52 2.58 608.30 94.65 68.89 Time: 08:26:59 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 30.00 4.00 792.00 32.00 24.24 0.39 11.53 6.12 20.80 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda3 0.00 0.00 30.00 4.00 792.00 32.00 24.24 0.39 11.53 6.12 20.80 sdb 12.00 0.00 5.00 0.00 168.00 0.00 33.60 0.04 8.80 8.80 4.40 sdb1 12.00 0.00 5.00 0.00 168.00 0.00 33.60 0.04 8.80 8.80 4.40
- 12-08-2009 #2Just Joined!
- Join Date
- Nov 2008
- Location
- Virginia, USA
- Posts
- 18
Just curious, when's the last time you've taken an outage and fsck'd all your partitions? Also, check out the output of smartctl on our drives (yum install smartmontools).
Code:yum install smartmontools
Then from a recovery shell (no partitions mounted):Code:smartctl -A /dev/sdx
That last one is a little ugly and will throw a few errors but it's quick and easy and it will fsck every sd* partition. Please post the output from these commands. I know an outage sucks but if our partitions are healthy and not too big it shouldn't take long. Smartctl doesn't need an outage though.Code:for i in `ls /dev/sd*`; do fsck -fy $i; done


Reply With Quote