Hello all,
Today one of my servers is becoming very slow, with very High Load Average (up to 200). There’s no very high CPU use, but a high IOWAIT.
Normally I can identify the problem, it is usually email, someone sending spam. But today I can’t find out what is causing the high load iowait.
My top shows:
top - 14:27:34 up 4:12, 2 users, load average: 84.58, 53.22, 43.74
Tasks: 681 total, 1 running, 679 sleeping, 0 stopped, 1 zombie
Cpu0 : 10.8%us, 2.0%sy, 0.0%ni, 0.0%id, 86.8%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 2.3%us, 0.7%sy, 0.0%ni, 96.3%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.3%us, 0.3%sy, 0.0%ni, 0.0%id, 99.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.3%us, 0.3%sy, 0.0%ni, 0.0%id, 99.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 5.3%us, 1.0%sy, 0.0%ni, 82.4%id, 11.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.7%us, 0.3%sy, 0.0%ni, 0.0%id, 99.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 1.3%us, 0.3%sy, 0.0%ni, 0.0%id, 98.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.0%us, 0.7%sy, 0.0%ni, 0.0%id, 99.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16037364k total, 13479312k used, 2558052k free, 1231820k buffers
Swap: 18284540k total, 0k used, 18284540k free, 7857368k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
55028 eptvmast 20 0 276m 21m 7512 S 6.0 0.1 0:00.92 php-cgi
45290 eptvmast 20 0 286m 31m 7804 S 4.3 0.2 0:14.55 php-cgi
54977 1261 20 0 277m 19m 7536 S 3.0 0.1 0:00.25 php-cgi
55165 1080 20 0 186m 24m 7476 D 2.0 0.2 0:00.06 php-cgi
16602 phoenixa 20 0 181m 20m 7744 S 1.3 0.1 0:49.99 php-cgi
55166 helicida 20 0 273m 16m 7276 D 1.3 0.1 0:00.04 php-cgi
3501 mysql 20 0 12.9g 1.1g 7492 S 1.0 7.1 2:58.34 mysqld
54940 root 20 0 78880 3436 2408 S 1.0 0.0 0:00.04 auth
45639 postfix 20 0 82552 4988 3112 S 0.7 0.0 0:00.12 cleanup
51587 root 20 0 15504 1820 992 R 0.7 0.0 0:01.30 top
54757 postfix 20 0 98952 5132 3564 S 0.7 0.0 0:00.20 smtpd
249 root 20 0 0 0 0 S 0.3 0.0 0:01.41 scsi_eh_3
1880 root 16 -4 93156 864 572 S 0.3 0.0 0:02.81 auditd
3688 root 20 0 238m 53m 3220 S 0.3 0.3 0:04.80 spamd
11834 dovecot 20 0 78840 3324 2268 S 0.3 0.0 0:10.39 auth
41193 postfix 20 0 82584 5004 3112 S 0.3 0.0 0:00.55 cleanup
47545 567 20 0 15824 2196 992 S 0.3 0.0 0:37.54 top
53714 1316 20 0 18460 1988 1548 S 0.3 0.0 0:00.14 imap
53715 apache 20 0 334m 16m 2712 S 0.3 0.1 0:00.01 httpd
54547 866 20 0 18472 1968 1564 S 0.3 0.0 0:00.01 imap
54912 756 20 0 18456 1964 1548 D 0.3 0.0 0:00.04 imap
1 root 20 0 19232 1528 1236 S 0.0 0.0 0:01.24 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:00.07 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.30 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 0:00.01 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/1
9 root 20 0 0 0 0 S 0.0 0.0 0:00.19 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 0:00.01 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/2
13 root 20 0 0 0 0 S 0.0 0.0 0:00.18 ksoftirqd/2
My iotop normally does not show anything unusual , but for every several seconds it shows like belllow for one second:
Total DISK READ: 69.10 K/s | Total DISK WRITE: 3.79 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
53168 be/4 vendas4@ 3.84 K/s 0.00 B/s 0.00 % 99.99 % dovecot/imap
31584 be/4 tlazarim 0.00 B/s 0.00 B/s 0.00 % 99.99 % spamd child
55433 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % repquota -u -v /home
55382 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % repquota -u -v /home
54574 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % repquota -u -v /home
55938 be/4 comunica 0.00 B/s 0.00 B/s 0.00 % 99.99 % procmail-wrapper -o -a mail2.intercode.com.br -d comunicacao-frutal.mg.gov.br
17162 be/4 barrosde 0.00 B/s 0.00 B/s 0.00 % 99.99 % spamd child
54903 be/4 educacao 26.87 K/s 0.00 B/s 0.00 % 99.99 % dovecot/imap
56043 be/4 a.robert 26.87 K/s 0.00 B/s 0.00 % 99.99 % dovecot/imap
52897 be/4 editorar 3.84 K/s 0.00 B/s 0.00 % 99.99 % spamd child
109664 be/4 controli 0.00 B/s 0.00 B/s 0.00 % 99.99 % php-cgi
460 be/3 root 0.00 B/s 153.55 K/s 0.00 % 95.16 % [jbd2/dm-0-8]
45595 be/4 postfix 0.00 B/s 11.52 K/s 0.00 % 24.99 % cleanup -z -t unix -u
5197 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 14.66 % httpd
3562 be/4 mysql 7.68 K/s 295.59 K/s 0.00 % 9.89 % mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
1631 be/4 root 0.00 B/s 19.19 K/s 0.00 % 4.60 % [flush-253:0]
44141 be/4 mysql 0.00 B/s 15.36 K/s 0.00 % 3.06 % mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
1911 be/4 root 0.00 B/s 7.68 K/s 0.00 % 0.48 % rsyslogd -i /var/run/syslogd.pid -c 5
109309 be/4 mysql 0.00 B/s 7.68 K/s 0.00 % 0.30 % mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
3523 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 0.23 % mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
41193 be/4 postfix 0.00 B/s 7.68 K/s 0.00 % 0.00 % cleanup -z -t unix -u
49440 be/4 apache 0.00 B/s 3.84 K/s 0.00 % 0.00 % httpd
53694 be/4 apache 0.00 B/s 3.84 K/s 0.00 % 0.00 % httpd
2713 be/4 clam 0.00 B/s 1167.00 K/s 0.00 % 0.00 % clamd
51907 be/4 apache 0.00 B/s 3.84 K/s 0.00 % 0.00 % httpd
56015 be/4 clam 0.00 B/s 0.00 B/s 0.00 % 0.00 % clamd
5496 be/4 root 0.00 B/s 3.84 K/s 0.00 % 0.00 % lookup-domain-daemon.pl
55207 be/4 apache 0.00 B/s 3.84 K/s 0.00 % 0.00 % httpd
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
5 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/0]
6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0]
7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
8 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/1]
9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1]
10 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/1]
11 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2]
12 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/2]
13 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/2]
14 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/2]
15 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3]
16 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/3]
17 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/3]
18 rt/4 root 0.00 B/s 0.00 B/
Normally it is like this:
Total DISK READ: 159.43 K/s | Total DISK WRITE: 680.50 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
460 be/3 root 0.00 B/s 0.00 B/s 0.00 % 35.70 % [jbd2/dm-0-8]
944 be/3 root 0.00 B/s 73.88 K/s 0.00 % 18.93 % [jbd2/dm-2-8]
108619 be/4 mysql 0.00 B/s 42.77 K/s 0.00 % 2.06 % mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
3562 be/4 mysql 0.00 B/s 7.78 K/s 0.00 % 1.96 % mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
3723 be/4 postgrey 0.00 B/s 15.55 K/s 0.00 % 0.04 % socket
43280 be/4 mysql 0.00 B/s 3.89 K/s 0.00 % 0.04 % mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
65734 be/4 postfix 0.00 B/s 11.67 K/s 0.00 % 0.00 % cleanup -z -t unix -u
41193 be/4 postfix 0.00 B/s 23.33 K/s 0.00 % 0.00 % cleanup -z -t unix -u
1881 be/3 root 0.00 B/s 7.78 K/s 0.00 % 0.00 % auditd
1911 be/4 root 0.00 B/s 3.89 K/s 0.00 % 0.00 % rsyslogd -i /var/run/syslogd.pid -c 5
65505 be/4 apache 0.00 B/s 3.89 K/s 0.00 % 0.00 % httpd
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
5 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/0]
6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0]
7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
8 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/1]
9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1]
10 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/1]
11 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2]
12 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/2]
13 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/2]
14 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/2]
15 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3]
16 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/3]
17 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/3]
18 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/3]
19 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/4]
20 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/4]
21 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/4]
22 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/4]
23 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/5]
24 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/5]
25 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/5]
26 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/5]
27 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/6]
28 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/6]
29 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/6]
30 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/6]
31 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/7]
32 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [stopper/7]
33 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/7]
34 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/7]
35 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/0]
36 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/1]
37 be/4 root 0.00 B/s 0.00 B/s
The postfix email queue is normal, and I usually get the culprit there. But not today.
I have enough free disk space
[root@linux03 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_linux_a-lv_root
900G 813G 42G 96% /
tmpfs 7.7G 0 7.7G 0% /dev/shm
/dev/sda1 477M 121M 331M 27% /boot
/dev/mapper/vg_linux_b-LogVol00
1.4T 911G 395G 70% /home
/dev/sdc1 1.8T 1.3T 414G 77% /backup
[root@linux03 ~]#
Can anyone please advise how to identify the problem?
Thanks a lot
Rogerio