Find the answer to your Linux question:
Results 1 to 2 of 2
Hi All, Trying to learn more about linux behavior I have started to go analyse our linux system since our service-application supplier report about issues with the linux OS. I ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Oct 2006
    Posts
    5

    Linux high load average and swap file issues


    Hi All,

    Trying to learn more about linux behavior I have started to go analyse our linux system since our service-application supplier report about issues with the linux OS.

    I am experiencing some problems with delays on my suse server. That made me investigate and I found that some times there is a high load average on my suse without being able to find the cause, however. See below at around 22:35.

    22:05:01 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
    22:10:01 1 3576 1,79 1,97 2,10
    22:15:01 1 3574 2,36 2,44 2,26
    22:20:01 1 3593 2,71 2,36 2,24
    22:25:01 0 3592 1,96 2,05 2,10
    22:30:01 5 3588 2,48 2,37 2,23
    22:35:02 3 3601 77,52 77,06 33,92
    22:40:02 2 3612 2,76 29,62 25,14
    22:45:02 1 3589 2,23 12,18 18,77
    22:50:02 1 3587 2,21 5,89 14,22
    22:55:02 0 3604 1,12 2,92 10,65

    It strikes me that the queue is low, but high load avg. I of course went further to check for cpu util, network load and disk stats, which are below:

    CPU Util:

    22:05:01 CPU %user %nice %system %iowait %idle
    22:10:01 all 10,64 0,00 1,37 0,19 87,80
    22:15:01 all 10,26 0,00 1,33 0,20 88,21
    22:20:01 all 6,64 0,00 1,28 0,15 91,93
    22:25:01 all 3,22 0,00 1,43 0,09 95,26
    22:30:01 all 3,03 0,00 1,50 0,05 95,43
    22:35:02 all 11,69 0,00 3,22 0,42 84,68
    22:40:02 all 11,92 0,00 1,94 0,41 85,73
    22:45:02 all 12,86 0,00 2,06 0,05 85,03
    22:50:02 all 12,78 0,00 2,06 0,05 85,11
    22:55:02 all 6,18 0,00 2,04 0,03 91,75

    Network:

    17:40:01 IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
    21:55:01 lo 39,91 39,91 9116,78 9116,78 0,00 0,00 0,00
    21:55:01 eth0 5,38 2,63 499,86 234,89 0,00 0,00 0,00
    22:20:01 eth1 13,76 13,42 6492,07 5013,41 0,00 0,00 0,00
    22:25:01 lo 41,05 41,05 16824,33 16824,33 0,00 0,00 0,00
    22:25:01 eth0 1337,70 1075,90 241383,12 186642,94 0,00 0,00 0,00
    22:25:01 eth1 18,77 18,19 7620,43 5998,74 0,00 0,00 0,00
    22:30:01 lo 44,85 44,85 32346,59 32346,59 0,00 0,00 0,00
    22:30:01 eth0 1484,35 1194,87 269371,37 208276,99 0,00 0,00 0,00
    22:30:01 eth1 17,23 16,76 7029,81 6711,39 0,00 0,00 0,00
    22:35:02 lo 59,88 59,88 38669,10 38669,10 0,00 0,00 0,00
    22:35:02 eth0 1295,72 1045,37 146044,63 181902,75 0,00 0,00 0,00
    22:35:02 eth1 25,68 24,49 10041,69 8213,90 0,00 0,00 0,00
    22:40:02 lo 40,77 40,77 9490,10 9490,10 0,00 0,00 0,00
    22:40:02 eth0 1585,74 1274,82 285520,51 220316,24 0,00 0,00 0,00
    22:40:02 eth1 44,74 126,29 13356,98 183718,23 0,00 0,00 0,00
    22:45:02 lo 39,45 39,45 11047,16 11047,16 0,00 0,00 0,00
    22:45:02 eth0 1613,57 1299,45 290995,47 224849,72 0,00 0,00 0,00
    22:45:02 eth1 37,09 118,86 9729,15 181647,86 0,00 0,00 0,00
    22:50:02 lo 50,15 50,15 29553,05 29553,05 0,00 0,00 0,00
    22:50:02 eth0 1579,88 1294,47 284972,59 387770,73 0,00 0,00 0,00
    22:50:02 eth1 18,74 18,34 8675,05 7070,80 0,00 0,00 0,00
    22:55:02 lo 57,16 57,16 142165,96 142165,96 0,00 0,00 0,00
    22:55:02 eth0 1694,00 1382,34 303740,60 412687,55 0,00 0,00 0,00
    22:55:02 eth1 13,24 12,95 5809,02 5099,89 0,00 0,00 0,00
    23:00:01 lo 55,65 55,65 140265,83 140265,83 0,00 0,00 0,00

    For comparison the night jobs run with these traffic stats without causing issues:

    23:55:01 eth0 13766685,09 12619096,20 3271190575,99 32585337183,43 0,00 0,00 88,65

    Disk:

    22:00:01 23,44 0,03 23,41 0,40 588,17
    22:05:01 50,43 4,76 45,66 51,99 1523,46

    22:05:01 tps rtps wtps bread/s bwrtn/s
    22:10:01 46,90 3,09 43,81 41,32 1094,56
    22:15:01 56,98 2,38 54,60 34,10 1267,51
    22:20:01 53,47 0,97 52,50 15,45 1196,38
    22:25:01 41,78 0,00 41,77 0,05 860,60
    22:30:01 34,65 0,01 34,64 0,21 776,05
    22:35:02 59,56 10,15 49,41 95,83 1254,10
    22:40:02 79,97 10,06 69,91 84,46 1474,17
    22:45:02 90,23 0,01 90,22 0,11 1895,80
    22:50:02 98,22 0,01 98,21 0,27 2520,58
    22:55:02 44,46 0,01 44,45 0,11 1423,40
    23:00:01 20,37 0,02 20,35 0,30 582,88

    Memory:

    22:05:01 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
    22:10:01 9366416 23594488 71,58 0 15336276 7954952 439000 5,23 456
    22:15:01 9294960 23665944 71,80 0 15399000 7956504 437448 5,21 440
    22:20:01 9220308 23740596 72,03 0 15465812 7957264 436688 5,20 448
    22:25:01 9172716 23788188 72,17 0 15514128 7957264 436688 5,20 448
    22:30:01 9151944 23808960 72,23 0 15533660 7957264 436688 5,20 448
    22:35:02 9238892 23722012 71,97 0 15537772 7957264 436688 5,20 448
    22:40:02 9214496 23746408 72,04 0 15555248 7957264 436688 5,20 448
    22:45:02 9220720 23740184 72,03 0 15554220 7957264 436688 5,20 448
    22:50:02 9215268 23745636 72,04 0 15558332 7957264 436688 5,20 448
    22:55:02 9176508 23784396 72,16 0 15570668 7957264 436688 5,20 448
    23:00:01 9160396 23800508 72,21 0 15572724 7957264 436688 5,20 448

    I can't see any abnormalities in these numbers indicating any reason for the high load average, yet it occurs. Except, the network stats is missing between 21:55 and 22:20? Could that indicate a network interface failure or?

    Anyway I can track more to find out what and why?


    Another issue I found was regarding a suddenly use of swap file as below:

    8:00:01 166472 32794432 99,49 240 23040636 8303924 90028 1,07 1744
    08:05:01 165896 32795008 99,50 236 22352908 8303924 90028 1,07 1744
    08:10:01 179116 32781788 99,46 236 22191512 8303924 90028 1,07 1744
    08:15:01 182228 32778676 99,45 236 22187400 8303924 90028 1,07 1744
    08:20:01 218880 32742024 99,34 236 22045536 8303924 90028 1,07 1744
    08:25:01 178476 32782428 99,46 300 22107152 8303924 90028 1,07 1744
    08:30:01 196276 32764628 99,40 300 22099732 8303924 90028 1,07 1968
    08:35:01 168876 32792028 99,49 296 22095624 8303924 90028 1,07 1968
    08:40:01 203572 32757332 99,38 296 21905444 8303924 90028 1,07 1968
    08:45:01 208212 32752692 99,37 296 21948620 8303924 90028 1,07 1968
    08:50:01 181556 32779348 99,45 296 21897220 8303924 90028 1,07 1968

    08:50:01 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
    08:55:01 165332 32795572 99,50 396 21757312 8303924 90028 1,07 1968
    09:00:01 188068 32772836 99,43 396 21711052 8303924 90028 1,07 1968
    09:05:01 1579564 31381340 95,21 396 21188828 8303924 90028 1,07 1968
    09:10:01 2139916 30820988 93,51 268 18362664 8303924 90028 1,07 232
    09:15:01 17336620 15624284 47,40 188 5866884 7917828 476124 5,67 68600
    09:20:01 16816828 16144076 48,98 188 6221224 7917828 476124 5,67 68920
    09:25:01 16630540 16330364 49,54 188 6412432 7917828 476124 5,67 68920
    09:30:02 16193552 16767352 50,87 188 6790256 7917828 476124 5,67 69400
    09:35:02 15670284 17290620 52,46 188 7177812 7917828 476124 5,67 69400
    09:40:02 15335984 17624920 53,47 188 7537612 7917828 476124 5,67 69400
    09:45:02 15032236 17928668 54,39 188 7749380 7917828 476124 5,67 69400
    09:50:02 14796400 18164504 55,11 188 7922084 7917828 476124 5,67 69400
    09:55:02 14605448 18355456 55,69 188 8046472 7917828 476124 5,67 69400
    10:00:01 14451000 18509904 56,16 188 8200672 7917828 476124 5,67 69400
    10:05:01 14240740 18720164 56,80 188 8311696 7917828 476124 5,67 69400
    10:10:01 14106696 18854208 57,20 188 8498792 7917828 476124 5,67 69400
    10:15:01 13806396 19154508 58,11 188 8637580 7917836 476116 5,67 69392
    10:20:01 13338968 19621936 59,53 188 9131020 7917836 476116 5,67 69392
    10:25:01 13201272 19759632 59,95 188 9245128 7917836 476116 5,67 69392
    14:30:01 8759944 24200960 73,42 292 12903368 7917844 476108 5,67 69700
    14:35:01 8704852 24256052 73,59 292 12955796 7917844 476108 5,67 69700
    14:40:01 8608472 24352432 73,88 292 12987664 7917844 476108 5,67 69700
    14:45:01 8485644 24475260 74,26 292 13031868 7917844 476108 5,67 69700
    14:50:01 8510704 24450200 74,18 292 13042148 7917844 476108 5,67 69700
    14:55:01 8567276 24393628 74,01 292 13046260 7917844 476108 5,67 69700
    15:00:02 161744 32799160 99,51 200 21945056 3683092 4710860 56,12 1570244
    15:05:02 164768 32796136 99,50 188 23580076 3683632 4710320 56,12 5140
    15:10:02 10172796 22788108 69,14 188 13583632 3683992 4709960 56,11 6340
    15:15:02 2798940 30161964 91,51 188 20918316 3684356 4709596 56,11 6436
    15:20:02 7370608 25590296 77,64 188 16480856 3690212 4703740 56,04 24524
    15:25:02 7135880 25825024 78,35 188 16700112 3698016 4695936 55,94 36568
    15:30:01 6936692 26024212 78,95 188 16864444 3703076 4690876 55,88 43912
    15:35:01 6779548 26181356 79,43 188 16978808 3708672 4685280 55,82 48796
    15:40:01 6468692 26492212 80,37 188 17311624 3714996 4678956 55,74 52136
    15:45:01 6335936 26624968 80,78 188 17527848 3738256 4655696 55,46 56932
    15:50:01 6287352 26673552 80,92 188 17570504 3738408 4655544 55,46 57452

    At 15:00 the systems starts using 50% more of the swapfile and it has consistenly, from that time on, been using 40-50% of the swap. I will of course give it more memory, but I would love to get to know what happened so suddenly? Anyway to track more info about this issues as well?

    The linux is:

    Linux version 2.6.16.60-0.69.1-smp (geekobuildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP Fri Sep 17 17:07:54 UTC 2010

    The system is running on Vmware ESX 5.1 (I believe it is) and has 2 x 10 Gbit/s NIC's (+ 4 x 1 Gbit/s for VMware/system/etc), 32 GB Ram allocated as well as 12 CPU cores from AMD Opteron CPU 12/16 cores in HP DL385 G7. Storage is NetApp Metro Cluster 3240 NFS, no local storage at all.

    I know I have pasted a lot of info in here, so I thank you for taking your time to read in to it and maybe supply me with your help.

    Any help or pointers are appreciated. Thank you.

    Best regards, Nicolai Frydenlund

    EDIT: The data for the swap file usage is of a later date so should not be compared to the first topic.
    Last edited by Splint28; 07-26-2013 at 10:56 AM.

  2. #2
    Penguin of trust elija's Avatar
    Join Date
    Jul 2004
    Location
    Either at home or at work or down the pub
    Posts
    3,502
    Three things that in my experience caused a similar issue on CentOS boxes were:

    1. Nightly backup job failing to terminate cleanly causing the data centres proprietary software to go into a loop of death
    2. Poorly optimised MySQL queries running multiple instances at busy times
    3. Failing network card.


    The solutions were:
    1. Change the backup process such that the backup daemon didn't actually run on the affected machine as that was the only one it went wrong on!
    2. Analyse the slow query log and the indexes, rebuilding indexes as appropriate. Re-writing queries where necessary.
    3. Replace the failing card. Running watch ifconfig showed the dropped packets counter increasing at a phenomenal rate on the failing card.


    I'm sure there are other possibilities also, but hopefully that will give you a few things to check.
    What do we want?
    Time machines!

    When do we want 'em?
    Doesn't really matter does it!?


    The Fifth Continent

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •