High System CPU Load Average

You posted the system activity since boot, you should also watch the ongoing activity. You can change the update interval with the i key. With t you can trigger a manual update.

It seems like the HDD is under constant high load. You can sort the process list by disk usage with shift-d and switch to disk details with d, to find out which process(es) are using the disk so much.

Right, I have to restart the server like every other day to get it back to normal processes enough for me to even login to SSH.

These logs are from the 5th - shows high LVM and DSK

ATOP - JSServer01 2014/09/05 13:29:42 --------- 3m22s elapsed
PRC | sys 6.72s | user 2.89s | #proc 141 | #trun 1 | #tslpi 161 | #tslpu 3 | #zombie 0 | clones 2157 | | #exit 0 |
CPU | sys 7% | user 6% | irq 0% | idle 304% | wait 82% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 4% | user 2% | irq 0% | idle 19% | cpu000 w 75% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 0% | user 3% | irq 0% | idle 93% | cpu001 w 4% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 3% | user 1% | irq 0% | idle 95% | cpu002 w 2% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 1% | user 0% | irq 0% | idle 97% | cpu003 w 2% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
CPL | avg1 1.17 | avg5 0.53 | | avg15 0.20 | | csw 256516 | intr 253915 | | | numcpu 4 |
MEM | tot 15.6G | free 14.2G | cache 489.2M | dirty 1.2M | buff 13.6M | slab 343.7M | | | | |
SWP | tot 2.0G | free 2.0G | | | | | | | vmcom 864.7M | vmlim 9.8G |
LVM | Group00-root | busy 82% | read 112338 | write 2805 | KiB/r 7 | KiB/w 4 | MBr/s 4.32 | MBw/s 0.05 | avq 4.86 | avio 1.45 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | KiB/r 4 | KiB/w 0 | MBr/s 0.01 | MBw/s 0.00 | avq 3.27 | avio 0.93 ms |
DSK | sda | busy 83% | read 67273 | write 1386 | KiB/r 13 | KiB/w 8 | MBr/s 4.46 | MBw/s 0.05 | avq 2.46 | avio 2.44 ms |
NET | transport | tcpi 28 | tcpo 27 | udpi 93 | udpo 145 | tcpao 2 | tcppo 1 | tcprs 1 | tcpie 0 | udpip 0 |
NET | network | ipi 151 | ipo 183 | ipfrw 0 | deliv 138 | | | | icmpi 17 | icmpo 9 |
NET | em1 0% | pcki 141 | pcko 122 | si 0 Kbps | so 0 Kbps | coll 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | lo ---- | pcki 33 | pcko 33 | si 0 Kbps | so 0 Kbps | coll 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
*** system and process activity since boot ***
PID TID RUID EUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR DSK CMD 1/8
1 - root root 1 0.55s 0.03s 19232K 1516K 428.7M 8236K N- - S 0 54% init
1038 - root root 1 0.01s 0.01s 108.0M 1804K 339.9M 1008K N- - S 0 42% rc
1923 - mysql mysql 11 0.01s 0.02s 477.5M 23128K 9304K 92K N- - S 1 1% mysqld
434 - root root 1 0.01s 0.17s 10760K 876K 9148K 0K N- - S 0 1% udevd
1973 - root root 1 0.03s 1.35s 239.1M 52280K 2804K 4K N- - S 0 0% spamd
1661 - haldaemo haldaemo 2 0.02s 0.03s 37824K 4200K 1560K 4K N- - S 0 0% hald
1323 - named named 7 0.02s 0.02s 382.6M 17392K 1468K 16K N- - S 0 0% named
346 - root root 1 0.00s 0.00s 0K 0K 0K 1128K N- - S 1 0% jbd2/dm-0-8
2072 - postfix postfix 1 0.00s 0.00s 81584K 3940K 1064K 0K N- - S 3 0% trivial-rewrit
1284 - root root 4 0.00s 0.00s 243.3M 1612K 416K 172K N- - S 0 0% rsyslogd
2073 - postfix postfix 1 0.00s 0.00s 81580K 3612K 572K 0K N- - S 0 0% smtp
2062 - root root 1 0.00s 0.03s 81296K 3408K 520K 8K N- - S 1 0% master
2106 - root root 1 0.01s 0.01s 269.3M 28532K 516K 4K N- - D 1 0% httpd
1662 - root root 1 0.00s 0.00s 20328K 1156K 520K 0K N- - S 0 0% hald-runner
2117 - root root 1 0.01s 0.00s 17532K 5252K 500K 4K N- - R 2 0% atop
1764 - root root 1 0.00s 0.00s 107.7M 1460K 368K 0K N- - S 2 0% mysqld_safe
2071 - postfix postfix 1 0.00s 0.00s 81520K 3504K 336K 0K N- - S 3 0% qmgr
157 - root root 1 5.11s 1.21s 36096K 1372K 276K 12K N- - S 1 0% plymouthd

It looks like init and rc are causing issues?

And this is the day before

ATOP - JSServer01 2014/09/02 00:00:01 --------- 10h54m10s elapsed PRC | sys 6m30s | user 22m41s | #proc 229 | #trun 3 | #tslpi 468 | #tslpu 0 | #zombie 1 | clones 48622 | | #exit 5 | CPU | sys 2% | user 17% | irq 0% | idle 369% | wait 12% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% | cpu | sys 0% | user 8% | irq 0% | idle 80% | cpu000 w 11% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% | cpu | sys 1% | user 5% | irq 0% | idle 94% | cpu002 w 1% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% | cpu | sys 0% | user 3% | irq 0% | idle 97% | cpu001 w 0% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% | cpu | sys 0% | user 2% | irq 0% | idle 98% | cpu003 w 0% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% | CPL | avg1 0.21 | avg5 0.23 | | avg15 0.26 | | csw 17736354 | intr 16086e3 | | | numcpu 4 | MEM | tot 15.6G | free 10.4G | cache 2.2G | dirty 1.5M | buff 273.5M | slab 521.9M | | | | | SWP | tot 2.0G | free 2.0G | | | | | | | vmcom 4.7G | vmlim 9.8G | LVM | Group00-root | busy 14% | read 248593 | write 1841e3 | KiB/r 15 | KiB/w 3 | MBr/s 0.09 | MBw/s 0.18 | avq 7.79 | avio 2.57 ms | LVM | Group00-swap | busy 0% | read 477 | write 0 | KiB/r 4 | KiB/w 0 | MBr/s 0.00 | MBw/s 0.00 | avq 1.74 | avio 1.86 ms | DSK | sda | busy 14% | read 185218 | write 758680 | KiB/r 20 | KiB/w 9 | MBr/s 0.10 | MBw/s 0.18 | avq 1.76 | avio 5.68 ms | NET | transport | tcpi 1188535 | tcpo 1085220 | udpi 31117 | udpo 31510 | tcpao 12781 | tcppo 28282 | tcprs 37102 | tcpie 0 | udpip 0 | NET | network | ipi 1239374 | ipo 1153826 | ipfrw 0 | deliv 1220e3 | | | | icmpi 280 | icmpo 142 | NET | em1 0% | pcki 1189111 | pcko 1707309 | si 26 Kbps | so 463 Kbps | coll 0 | erri 0 | erro 0 | drpi 0 | drpo 0 | NET | lo ---- | pcki 107796 | pcko 107796 | si 12 Kbps | so 12 Kbps | coll 0 | erri 0 | erro 0 | drpi 0 | drpo 0 | *** system and process activity since boot *** PID TID RUID EUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR DSK CMD 1/13 2108 - mysql mysql 18 55.63s 4m35s 1.1G 68400K 213.8M 3.0G N- - S 3 36% mysqld 2310 - root apache 1 0.34s 0.18s 218.7M 7228K 1.1G 300.1M N- - S 1 15% httpd 1 - root root 1 0.56s 0.03s 19356K 1548K 782.4M 27124K N- - S 3 9% init 2260 - root root 1 0.87s 0.17s 81296K 3408K 611.1M 67524K N- - S 3 7% master 347 - root root 1 11.44s 0.00s 0K 0K 0K 634.0M N- - S 3 7% jbd2/dm-0-8 3179 - root root 1 0.40s 1.01s 86620K 15836K 149.0M 458.3M N- - S 2 7% miniserv.pl 2302 - root root 1 2.02s 0.39s 414.0M 38388K 246.7M 296.8M N- - S 3 6% httpd 3013 - root root 8 0.00s 0.00s 690.8M 6292K 121.1M 155.9M N- - S 3 3% dsm_om_shrsvcd 2949 - root root 1 0.00s 0.00s 131.9M 712K 151.6M 14536K N- - S 2 2% dsm_om_connsvc 9091 - drivingr drivingr 1 6.12s 93.42s 302.7M 97156K 724K 112.2M N- - S 0 1% php-cgi 17087 - drivingr drivingr 1 4.00s 63.19s 310.4M 102.6M 1472K 90044K N- - S 2 1% php-cgi 20851 - drivingr drivingr 1 4.47s 50.94s 310.1M 102.2M 10436K 78680K N- - S 0 1% php-cgi 2182 - postgrey postgrey 1 0.20s 1.05s 154.2M 13904K 12784K 55040K N- - S 2 1% postgrey 5853 - bojotool bojotool 1 2.66s 62.79s 279.2M 77548K 46700K 9264K N- - S 0 1% php-cgi 398 - root root 1 1.53s 0.00s 0K 0K 448K 49824K N- - S 2 1% flush-253:0 5858 - bojotool bojotool 1 2.50s 63.19s 285.7M 78408K 26052K 10380K N- - S 0 0% php-cgi 2321 - root root 1 0.18s 0.08s 114.5M 1268K 20052K 16180K N- - S 3 0% crond 1466 - root root 4 0.50s 0.47s 243.3M 1772K 1144K 27104K N- - S 0 0% rsyslogd

The load has now shot up to over 1, and the dsk is flashing on atop

[code]ATOP - JSServer01 2014/09/09 00:06:50 --------- 10s elapsed
PRC | sys 0.37s | user 2.16s | #proc 238 | #trun 2 | #tslpi 492 | #tslpu 1 | #zombie 0 | clones 5 | | #exit 1 |
CPU | sys 3% | user 22% | irq 0% | idle 278% | wait 96% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 2% | user 13% | irq 0% | idle 1% | cpu000 w 84% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 1% | user 4% | irq 0% | idle 96% | cpu003 w 0% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 1% | user 3% | irq 0% | idle 95% | cpu001 w 2% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 0% | user 2% | irq 0% | idle 87% | cpu002 w 11% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
CPL | avg1 1.37 | avg5 1.17 | | avg15 0.66 | | csw 9608 | intr 13480 | | | numcpu 4 |
MEM | tot 15.6G | free 4.7G | cache 7.8G | dirty 5.3M | buff 347.1M | slab 597.7M | | | | |
SWP | tot 2.0G | free 2.0G | | | | | | | vmcom 4.8G | vmlim 9.8G |
LVM | Group00-root | busy 98% | read 1433 | write 958 | KiB/r 4 | KiB/w 3 | MBr/s 0.57 | MBw/s 0.37 | avq 1.71 | avio 4.08 ms |
DSK | sda | busy 98% | read 1433 | write 88 | KiB/r 4 | KiB/w 43 | MBr/s 0.57 | MBw/s 0.37 | avq 1.12 | avio 6.41 ms |
NET | transport | tcpi 614 | tcpo 386 | udpi 17 | udpo 17 | tcpao 6 | tcppo 8 | tcprs 15 | tcpie 0 | udpip 0 |
NET | network | ipi 632 | ipo 418 | ipfrw 0 | deliv 631 | | | | icmpi 0 | icmpo 0 |
NET | em1 0% | pcki 657 | pcko 630 | si 59 Kbps | so 647 Kbps | coll 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | lo ---- | pcki 48 | pcko 48 | si 8 Kbps | so 8 Kbps | coll 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |

PID TID RUID EUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR DSK CMD 1/4
27735 - root root 1 0.13s 0.12s 0K 0K 5784K 0K – - D 2 65% tar
346 - root root 1 0.01s 0.00s 0K 0K 0K 1188K – - S 2 13% jbd2/dm-0-8
3163 - root root 1 0.00s 0.00s 0K 0K 516K 8K – - S 0 6% miniserv.pl
23261 - drivingr drivingr 1 0.03s 0.49s 10240K 9924K 0K 492K – - S 0 6% php-cgi
18385 - drivingr drivingr 1 0.04s 0.92s -8704K -8476K 0K 316K – - S 0 4% php-cgi
27737 - root root 1 0.00s 0.00s 0K 0K 0K 304K – - S 0 3% cat
12883 - mysql mysql 15 0.01s 0.02s 0K 0K 0K 180K – - S 3 2% mysqld
1323 - root root 4 0.00s 0.00s 0K 0K 0K 24K – - S 0 0% rsyslogd
18286 - apache apache 4 0.00s 0.03s 0K 96K 0K 12K – - S 1 0% httpd
27791 - postfix postfix 1 0.00s 0.00s 82252K 4640K 0K 12K N- - S 2 0% cleanup
18258 - apache apache 5 0.02s 0.08s 0K 0K 0K 8K – - S 1 0% httpd
18265 - apache apache 5 0.00s 0.00s 0K 0K 0K 8K – - S 2 0% httpd
21194 - apache apache 5 0.00s 0.00s 0K 0K 0K 8K – - S 1 0% httpd
715 - root root 1 0.00s 0.00s 0K 0K 0K 8K – - S 2 0% flush-253:0
23173 - apache apache 5 0.00s 0.01s 0K 0K 0K 4K – - S 1 0% httpd
[/code]

You again posted the “System activity since boot”, you might want to observe the ongoing activity (press t to trigger a manual update of the screen) when the HDD is under high load, there check which processes use the most and how much.

Well i think i’ve managed to trace it down to what is causing the disk issues, two processes init and rc

Now what would be causing this?

ATOP - JSServer01 2014/09/25 12:51:08 --------- 4m36s elapsed
PRC | sys 8.92s | user 3.42s | #proc 138 | #trun 1 | #tslpi 158 | #tslpu 3 | #zombie 0 | clones 2275 | #exit 0 |
CPU | sys 6% | user 5% | irq 0% | idle 303% | wait 85% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 3% | user 2% | irq 0% | idle 16% | cpu000 w 79% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 3% | user 1% | irq 0% | idle 95% | cpu002 w 1% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 0% | user 2% | irq 0% | idle 93% | cpu001 w 4% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 0% | user 0% | irq 0% | idle 98% | cpu003 w 1% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
CPL | avg1 1.21 | avg5 0.67 | avg15 0.27 | | | csw 299648 | intr 335944 | | numcpu 4 |
MEM | tot 15.6G | free 14.1G | cache 519.1M | dirty 8.7M | buff 15.6M | slab 373.4M | | | |
SWP | tot 2.0G | free 2.0G | | | | | | vmcom 899.3M | vmlim 9.8G |
LVM | Group00-root | busy 86% | read 121974 | write 3222 | KiB/r 7 | KiB/w 3 | MBr/s 3.37 | MBw/s 0.05 | avio 1.89 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | KiB/r 4 | KiB/w 0 | MBr/s 0.00 | MBw/s 0.00 | avio 0.93 ms |
DSK | sda | busy 86% | read 79345 | write 1608 | KiB/r 12 | KiB/w 8 | MBr/s 3.47 | MBw/s 0.05 | avio 2.94 ms |
NET | transport | tcpi 16 | tcpo 16 | udpi 178 | udpo 210 | tcpao 0 | tcppo 0 | tcprs 0 | udpip 0 |
NET | network | ipi 205 | ipo 236 | ipfrw 0 | deliv 197 | | | icmpi 3 | icmpo 9 |
NET | em1 0% | pcki 225 | pcko 180 | si 2 Kbps | so 0 Kbps | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | lo ---- | pcki 29 | pcko 29 | si 0 Kbps | so 0 Kbps | erri 0 | erro 0 | drpi 0 | drpo 0 |
*** system and process activity since boot ***
PID TID RDDSK WRDSK WCANCL DSK CMD 1/28
1 - 462.4M 7388K 8K 54% init
1110 - 345.5M 992K 472K 40% rc
2016 - 26432K 432K 0K 3% mysqld
2108 - 3456K 8436K 4K 1% postgrey
439 - 9272K 0K 0K 1% udevd