System Slowdown Catastrohpic!

Ubuntu 14.01 LTS
16GB RAM
Power Edge Dell Dual Xeon 1.9 (8 cores total)

about 1/week I experience a horrendous slowdown on websites.

mail seems to run fine, I’ve installed a ton of diagnostic tools to see the problem but i can’t fix it, htop shows all cores pegged, websites take like 10 seconds to first byte or error 500 out all together on complex .php sites.

the box has 17 sites.

Attatched are some htop results

Any thoughts or direction on how I can fix this?
I’d like to put another important site on here but im scared because of these slowdowns.

Howdy,

Your htop output there does show a load of 10.00, which is a bit on the high end. It’s tough to determine the culprit from that though… next time that occurs, could you run the following commands, and share the output they produce:

ps auxwf mailq | tail -1 netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

Also, just to reduce the load on your server – you may want to go into System Settings -> Virtualmin Config -> Status Collection, and there, set “Interval between status collection job runs” to something a bit higher than the default… perhaps 60 minutes would be a good place to start.

-Eric

Who your server with? I have 16gb servers with up to a hundred sites and they dont break a sweat.

Server needs tweaking I bet or your on a lousey network.

The server is on a dedicated 20/20 Fiberoptic connection with Suddenlink Communications,

This server is WAY overpowered for what we are doing i migrated to virtualmin from an IMSCP box (those guys in their forums are real a-holes, I started with VHCS, then forked to ISPCP, then finally IMSCP, and the server I came from had the same websites but was a 10th as powerful and did fine, and this server has plenty of power something has to be misconfigured)

So I migrated both Linux control panels and physical servers (I’m also running on VSPhere)

ps auxwf mailq | tail -1 netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

Output:

ps auxwf

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2 0.0 0.0 0 0 ? S Mar12 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? S Mar12 1:31 \_ [ksoftirqd/0] root 4 0.0 0.0 0 0 ? S Mar12 0:00 \_ [kworker/0:0] root 5 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kworker/0:0H] root 7 0.6 0.0 0 0 ? S Mar12 46:30 \_ [rcu_sched] root 8 0.2 0.0 0 0 ? S Mar12 16:03 \_ [rcuos/0] root 9 0.2 0.0 0 0 ? S Mar12 15:34 \_ [rcuos/1] root 10 0.2 0.0 0 0 ? S Mar12 15:43 \_ [rcuos/2] root 11 0.1 0.0 0 0 ? S Mar12 13:07 \_ [rcuos/3] root 12 0.1 0.0 0 0 ? S Mar12 12:47 \_ [rcuos/4] root 13 0.2 0.0 0 0 ? S Mar12 15:26 \_ [rcuos/5] root 14 0.0 0.0 0 0 ? S Mar12 0:00 \_ [rcu_bh] root 15 0.0 0.0 0 0 ? S Mar12 0:00 \_ [rcuob/0] root 16 0.0 0.0 0 0 ? S Mar12 0:00 \_ [rcuob/1] root 17 0.0 0.0 0 0 ? S Mar12 0:00 \_ [rcuob/2] root 18 0.0 0.0 0 0 ? S Mar12 0:00 \_ [rcuob/3] root 19 0.0 0.0 0 0 ? S Mar12 0:00 \_ [rcuob/4] root 20 0.0 0.0 0 0 ? S Mar12 0:00 \_ [rcuob/5] root 21 0.0 0.0 0 0 ? S Mar12 1:28 \_ [migration/0] root 22 0.0 0.0 0 0 ? S Mar12 0:11 \_ [watchdog/0] root 23 0.0 0.0 0 0 ? S Mar12 0:10 \_ [watchdog/1] root 24 0.0 0.0 0 0 ? S Mar12 1:12 \_ [migration/1] root 25 0.0 0.0 0 0 ? S Mar12 1:20 \_ [ksoftirqd/1] root 27 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kworker/1:0H] root 28 0.0 0.0 0 0 ? S Mar12 0:08 \_ [watchdog/2] root 29 0.0 0.0 0 0 ? S Mar12 1:07 \_ [migration/2] root 30 0.0 0.0 0 0 ? S Mar12 1:19 \_ [ksoftirqd/2] root 31 0.0 0.0 0 0 ? S Mar12 0:00 \_ [kworker/2:0] root 32 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kworker/2:0H] root 33 0.0 0.0 0 0 ? S Mar12 0:09 \_ [watchdog/3] root 34 0.0 0.0 0 0 ? S Mar12 1:00 \_ [migration/3] root 35 0.0 0.0 0 0 ? S Mar12 1:01 \_ [ksoftirqd/3] root 36 0.0 0.0 0 0 ? S Mar12 0:00 \_ [kworker/3:0] root 37 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kworker/3:0H] root 38 0.0 0.0 0 0 ? S Mar12 0:10 \_ [watchdog/4] root 39 0.0 0.0 0 0 ? S Mar12 0:57 \_ [migration/4] root 40 0.0 0.0 0 0 ? S Mar12 0:51 \_ [ksoftirqd/4] root 41 0.0 0.0 0 0 ? S Mar12 0:00 \_ [kworker/4:0] root 42 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kworker/4:0H] root 43 0.0 0.0 0 0 ? S Mar12 0:11 \_ [watchdog/5] root 44 0.0 0.0 0 0 ? S Mar12 1:01 \_ [migration/5] root 45 0.0 0.0 0 0 ? S Mar12 1:20 \_ [ksoftirqd/5] root 46 0.0 0.0 0 0 ? S Mar12 0:00 \_ [kworker/5:0] root 47 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kworker/5:0H] root 48 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [khelper] root 49 0.0 0.0 0 0 ? S Mar12 0:00 \_ [kdevtmpfs] root 50 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [netns] root 51 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [writeback] root 52 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kintegrityd] root 53 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [bioset] root 54 0.0 0.0 0 0 ? S< Mar12 0:12 \_ [kworker/u13:0] root 55 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kblockd] root 56 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [ata_sff] root 57 0.0 0.0 0 0 ? S Mar12 0:00 \_ [khubd] root 58 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [md] root 59 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [devfreq_wq] root 60 0.0 0.0 0 0 ? S Mar12 1:04 \_ [kworker/3:1] root 61 0.0 0.0 0 0 ? S Mar12 0:38 \_ [kworker/0:1] root 62 0.0 0.0 0 0 ? S Mar12 0:35 \_ [kworker/1:1] root 63 0.0 0.0 0 0 ? S Mar12 0:35 \_ [kworker/4:1] root 64 0.0 0.0 0 0 ? S Mar12 0:34 \_ [kworker/5:1] root 65 0.0 0.0 0 0 ? S Mar12 0:39 \_ [kworker/2:1] root 67 0.0 0.0 0 0 ? S Mar12 0:00 \_ [khungtaskd] root 68 0.0 0.0 0 0 ? S Mar12 0:21 \_ [kswapd0] root 69 0.0 0.0 0 0 ? SN Mar12 0:00 \_ [ksmd] root 70 0.0 0.0 0 0 ? SN Mar12 0:41 \_ [khugepaged] root 71 0.0 0.0 0 0 ? S Mar12 0:01 \_ [fsnotify_mark] root 72 0.0 0.0 0 0 ? S Mar12 0:00 \_ [ecryptfs-kthrea] root 73 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [crypto] root 85 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kthrotld] root 87 0.0 0.0 0 0 ? S Mar12 0:00 \_ [scsi_eh_0] root 88 0.0 0.0 0 0 ? S Mar12 0:00 \_ [scsi_eh_1] root 109 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [deferwq] root 110 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [charger_manager] root 163 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kpsmoused] root 164 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [mpt_poll_0] root 165 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [mpt/0] root 181 0.0 0.0 0 0 ? S Mar12 0:00 \_ [scsi_eh_2] root 189 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kdmflush] root 190 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [bioset] root 191 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [kdmflush] root 193 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [bioset] root 208 0.1 0.0 0 0 ? S Mar12 10:50 \_ [jbd2/dm-0-8] root 209 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [ext4-rsv-conver] root 357 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [ext4-rsv-conver] root 480 0.0 0.0 0 0 ? S< Mar12 0:00 \_ [ttm_swap] root 1968 0.0 0.0 0 0 ? S Mar12 0:00 \_ [kauditd] root 25942 0.0 0.0 0 0 ? S< Mar14 0:16 \_ [kworker/u13:2] root 12655 0.0 0.0 0 0 ? S 14:03 0:00 \_ [kworker/1:0] root 19830 0.4 0.0 0 0 ? S 19:39 0:05 \_ [kworker/u12:2] root 20396 0.5 0.0 0 0 ? S 19:42 0:06 \_ [kworker/u12:3] root 22090 1.2 0.0 0 0 ? S 19:51 0:06 \_ [kworker/u12:0] root 23878 0.0 0.0 0 0 ? S 19:58 0:00 \_ [kworker/u12:1] root 1 0.0 0.0 33620 2948 ? Ss Mar12 4:55 /sbin/init root 385 0.0 0.0 19608 920 ? S Mar12 0:04 upstart-udev-bridge --daemon root 394 0.0 0.0 51360 1664 ? Ss Mar12 0:01 /lib/systemd/systemd-udevd --daemon message+ 435 0.0 0.0 39228 1324 ? Ss Mar12 0:19 dbus-daemon --system --fork root 475 0.0 0.0 43528 1956 ? Ss Mar12 0:21 /lib/systemd/systemd-logind root 960 0.0 0.0 16200 1432 ? S Mar12 0:00 upstart-file-bridge --daemon root 1152 0.0 0.0 15656 920 ? S Mar12 0:00 upstart-socket-bridge --daemon root 1278 0.0 0.0 15820 952 tty4 Ss+ Mar12 0:00 /sbin/getty -8 38400 tty4 root 1282 0.0 0.0 15820 956 tty5 Ss+ Mar12 0:00 /sbin/getty -8 38400 tty5 root 1287 0.0 0.0 15820 948 tty2 Ss+ Mar12 0:00 /sbin/getty -8 38400 tty2 root 1288 0.0 0.0 15820 956 tty3 Ss+ Mar12 0:00 /sbin/getty -8 38400 tty3 root 1292 0.0 0.0 15820 948 tty6 Ss+ Mar12 0:00 /sbin/getty -8 38400 tty6 root 1310 0.0 0.0 61364 3076 ? Ss Mar12 0:00 /usr/sbin/sshd -D root 23204 0.2 0.0 105632 4320 ? Ss 19:56 0:00 \_ sshd: root@pts/1 root 23690 0.5 0.0 23084 4256 pts/1 Ss 19:57 0:01 \_ -bash root 24345 0.0 0.0 18608 1452 pts/1 R+ 20:00 0:00 \_ ps auxwf root 1314 0.0 0.0 17776 1536 ? Ss Mar12 2:06 /usr/sbin/dovecot -F -c /etc/dovecot/dovecot.conf dovecot 1494 0.0 0.0 9284 948 ? S Mar12 0:52 \_ dovecot/anvil root 1495 0.0 0.0 9412 1136 ? S Mar12 0:46 \_ dovecot/log root 30330 0.0 0.0 18812 2304 ? S Mar16 1:25 \_ dovecot/config dovecot 32328 0.1 0.0 16264 1656 ? S 11:17 0:45 \_ dovecot/auth 1028 27213 0.2 0.0 22672 3964 ? S 17:34 0:21 \_ dovecot/imap root 20389 1.3 0.0 29040 2192 ? S 19:42 0:15 \_ dovecot/auth -w dovenull 23983 0.2 0.0 18088 2828 ? S 19:58 0:00 \_ dovecot/pop3-login lincoln+ 23985 0.1 0.0 21360 2340 ? S 19:58 0:00 \_ dovecot/pop3 root 24185 1.2 0.0 28808 1996 ? S 20:00 0:00 \_ dovecot/auth -w root 1328 0.0 0.0 23656 1056 ? Ss Mar12 0:07 cron daemon 1329 0.0 0.0 19140 160 ? Ss Mar12 0:00 atd root 1333 0.0 0.0 4368 664 ? Ss Mar12 0:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket ntp 1462 0.0 0.0 31444 2132 ? Ss Mar12 1:01 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 117:124 mysql 1481 0.9 2.6 2785728 348000 ? Ssl Mar12 73:40 /usr/sbin/mysqld root 1665 0.2 0.0 91772 4640 ? S Mar12 16:57 /usr/sbin/vmtoolsd list 1742 0.0 0.0 60556 9248 ? Ss Mar12 0:00 /usr/bin/python /usr/lib/mailman/bin/mailmanctl -s -q start list 1745 0.0 0.0 60496 11448 ? S Mar12 2:34 \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=ArchRunner:0:1 -s list 1746 0.0 0.0 60496 11480 ? S Mar12 2:40 \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=BounceRunner:0:1 -s list 1747 0.0 0.0 60500 11452 ? S Mar12 2:34 \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=CommandRunner:0:1 -s list 1748 0.0 0.0 60432 11444 ? S Mar12 2:34 \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s list 1749 0.0 0.0 60452 11520 ? S Mar12 2:35 \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=NewsRunner:0:1 -s list 1751 0.0 0.0 60472 11604 ? S Mar12 2:37 \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s list 1752 0.0 0.0 60488 11532 ? S Mar12 2:32 \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=VirginRunner:0:1 -s list 1754 0.0 0.0 60488 11432 ? S Mar12 0:02 \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=RetryRunner:0:1 -s root 1858 0.0 0.0 25344 1696 ? Ss Mar12 1:52 /usr/lib/postfix/master postfix 1950 0.0 0.0 40400 3140 ? S Mar12 0:56 \_ tlsmgr -l -t unix -u -c postfix 611 0.0 0.0 28484 2848 ? S Mar12 1:33 \_ qmgr -l -t unix -u postfix 32297 0.0 0.0 27408 1548 ? S 08:30 0:09 \_ anvil -l -t unix -u -c postfix 13376 0.1 0.0 27540 1556 ? S 19:03 0:05 \_ showq -t unix -u -c postfix 17377 0.0 0.0 27640 2176 ? S 19:26 0:02 \_ cleanup -z -t unix -u -c postfix 18927 0.0 0.0 27456 2160 ? S 19:36 0:00 \_ local -t unix postfix 22396 0.0 0.0 27420 1892 ? S 19:53 0:00 \_ trivial-rewrite -n rewrite -t unix -u -c postfix 22420 0.0 0.0 27408 1616 ? S 19:53 0:00 \_ pickup -l -t unix -u -c postfix 22437 0.0 0.0 27456 2128 ? S 19:53 0:00 \_ local -t unix postfix 22975 0.2 0.0 59524 4576 ? S 19:55 0:00 \_ smtpd -n 173.219.81.61:smtp -t inet -u -c -o stress= -o smtpd_sasl_auth_enable=yes postfix 23008 0.0 0.0 27456 2124 ? S 19:56 0:00 \_ local -t unix root 1907 0.0 0.0 93324 2192 ? Ss Mar12 0:53 /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5 root 1908 0.0 0.0 93320 2184 ? S Mar12 0:52 \_ /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5 root 1909 0.0 0.0 93320 2184 ? S Mar12 0:54 \_ /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5 root 1910 0.0 0.0 93324 2184 ? S Mar12 0:55 \_ /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5 root 1911 0.0 0.0 93320 2184 ? S Mar12 0:52 \_ /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5 root 1993 0.0 0.1 76676 18972 ? Ss Mar12 0:14 /usr/bin/perl /usr/share/usermin/miniserv.pl /etc/usermin/miniserv.conf clamav 4017 0.3 2.4 1066144 333284 ? Ssl Mar12 22:50 /usr/sbin/clamd bind 4131 2.1 0.5 559868 79124 ? Ssl Mar12 164:24 /usr/sbin/named -u bind root 4150 0.0 0.4 91516 62832 ? Ss Mar12 3:23 /usr/share/webmin/virtual-server/lookup-domain-daemon.pl root 4170 0.0 0.4 137520 64492 ? Ss Mar12 5:44 /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -d --pidfile=/var/run/spamd.pid root 962 21.9 0.6 153964 81508 ? S 18:02 25:54 \_ spamd child root 1422 8.2 0.5 151600 79212 ? S 18:05 9:34 \_ spamd child root 4393 0.0 0.5 141544 73836 ? Ss Mar12 2:15 /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf root 22462 0.1 0.5 141544 73836 ? S 19:53 0:00 \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf root 23110 0.1 0.5 141544 73832 ? S 19:56 0:00 \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf root 23623 0.1 0.5 141544 73836 ? S 19:57 0:00 \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf root 23624 0.1 0.5 141544 73768 ? S 19:57 0:00 \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf root 23947 0.1 0.5 141544 73836 ? S 19:58 0:00 \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf root 24081 0.1 0.5 141544 73772 ? S 19:59 0:00 \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf root 4410 0.0 0.0 15820 952 tty1 Ss+ Mar12 0:00 /sbin/getty -8 38400 tty1 root 8697 0.0 0.1 434740 25060 ? Ss Mar12 1:01 /usr/sbin/apache2 -k start www-data 30765 0.0 0.0 434680 10896 ? S Mar15 0:00 \_ /usr/sbin/apache2 -k start www-data 30820 0.0 0.0 241604 9328 ? S Mar15 0:22 \_ /usr/sbin/apache2 -k start usamaf 30830 0.2 0.3 340976 42524 ? S Mar15 7:28 | \_ /usr/bin/php5-cgi otakuan+ 5008 0.0 0.1 327880 16720 ? S Mar15 0:00 | \_ /usr/bin/php5-cgi usamaf 28613 0.1 0.2 335480 34564 ? S Mar15 4:30 | \_ /usr/bin/php5-cgi mthopef+ 4064 0.0 0.2 333644 27572 ? S 05:03 0:34 | \_ /usr/bin/php5-cgi idsnetw+ 4113 0.2 0.2 331444 36316 ? S 11:32 1:21 | \_ /usr/bin/php5-cgi wvkarate 11088 0.3 0.2 331328 31952 ? S 13:57 1:22 | \_ /usr/bin/php5-cgi fastfoo+ 15738 0.0 0.1 328444 18292 ? S 14:14 0:15 | \_ /usr/bin/php5-cgi usamaf 20031 0.4 0.2 334200 30896 ? S 14:27 1:38 | \_ /usr/bin/php5-cgi 1113 24500 2.2 0.2 329876 36224 ? S 14:50 6:54 | \_ /usr/bin/php5-cgi tomscot+ 12815 0.9 0.3 338796 47468 ? S 16:34 1:57 | \_ /usr/bin/php5-cgi idsnetw+ 18042 0.4 0.2 337972 28496 ? S 17:00 0:44 | \_ /usr/bin/php5-cgi idsnetw+ 18045 0.5 0.2 337196 28004 ? S 17:00 0:54 | \_ /usr/bin/php5-cgi 1113 2167 2.2 0.2 332460 38792 ? S 18:07 2:31 | \_ /usr/bin/php5-cgi planodo+ 8880 3.8 0.4 341284 57456 ? S 18:39 3:06 | \_ /usr/bin/php5-cgi 1006 11625 4.3 0.2 330028 29144 ? S 18:55 2:48 | \_ /usr/bin/php5-cgi 1006 11635 7.1 0.2 330020 29092 ? R 18:55 4:38 | \_ /usr/bin/php5-cgi 1006 11638 4.4 0.2 330020 29048 ? S 18:55 2:54 | \_ /usr/bin/php5-cgi www-data 18878 0.0 0.0 435512 12764 ? S 17:03 0:07 \_ /usr/sbin/apache2 -k start www-data 25003 0.0 0.0 435376 11932 ? S 17:26 0:06 \_ /usr/sbin/apache2 -k start www-data 31694 0.0 0.0 435400 12680 ? S 17:55 0:03 \_ /usr/sbin/apache2 -k start www-data 31696 0.0 0.0 435452 11976 ? S 17:55 0:03 \_ /usr/sbin/apache2 -k start www-data 31697 0.0 0.0 435376 11916 ? S 17:55 0:04 \_ /usr/sbin/apache2 -k start www-data 10812 0.0 0.0 435360 11904 ? S 18:53 0:02 \_ /usr/sbin/apache2 -k start www-data 10813 0.0 0.0 435320 12636 ? S 18:53 0:02 \_ /usr/sbin/apache2 -k start www-data 10820 0.0 0.0 435424 12652 ? S 18:53 0:02 \_ /usr/sbin/apache2 -k start www-data 10822 0.0 0.0 435328 11868 ? S 18:53 0:02 \_ /usr/sbin/apache2 -k start www-data 10823 0.0 0.0 435364 11888 ? S 18:53 0:02 \_ /usr/sbin/apache2 -k start www-data 10825 0.0 0.0 435368 11904 ? S 18:53 0:02 \_ /usr/sbin/apache2 -k start www-data 10847 0.0 0.0 435352 11888 ? S 18:53 0:02 \_ /usr/sbin/apache2 -k start www-data 10853 0.0 0.0 435344 11868 ? S 18:53 0:02 \_ /usr/sbin/apache2 -k start www-data 19856 0.0 0.0 435276 11632 ? S 19:39 0:00 \_ /usr/sbin/apache2 -k start www-data 22323 0.0 0.0 435276 11628 ? S 19:52 0:00 \_ /usr/sbin/apache2 -k start www-data 22324 0.0 0.0 435224 11636 ? S 19:52 0:00 \_ /usr/sbin/apache2 -k start www-data 22326 0.0 0.0 434820 10876 ? S 19:52 0:00 \_ /usr/sbin/apache2 -k start www-data 22327 0.0 0.0 435276 11532 ? S 19:52 0:00 \_ /usr/sbin/apache2 -k start www-data 22328 0.0 0.0 435276 11400 ? S 19:52 0:00 \_ /usr/sbin/apache2 -k start www-data 22329 0.0 0.0 435324 11692 ? S 19:52 0:00 \_ /usr/sbin/apache2 -k start proftpd 30651 0.0 0.0 113904 2480 ? Ss Mar15 0:31 proftpd: (accepting connections) syslog 3290 0.1 0.1 256040 14680 ? Ssl Mar15 4:07 rsyslogd root 23784 0.1 0.0 17160 4888 ? S<L 00:00 1:13 /usr/bin/atop -a -w /var/log/atop/atop_20150317 600 root 23804 0.4 0.1 65628 20308 ? Ss 00:00 5:31 lfd - sleeping root 24342 110 0.1 66156 19912 ? R 20:00 0:01 \_ lfd - (child) process tracking... root 24079 7.9 0.1 127420 20456 ? S 19:59 0:04 /usr/bin/perl -w ./clean_graph.pl 60 cpu

mailq | tail -1

-- 795 Kbytes in 54 Requests.

(Honestly I monitor the mailq via historical stats and it rarely exceeds 100 at any one time)

netstat -ntu | awk ‘{print $5}’ | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

7 173.80.16.220
4 127.0.0.1
2
1 servers)
1 Address
1 76.9.85.41
1 68.180.228.90
1 209.85.160.140
1 207.236.147.203
1 174.22.182.197

As you can see there is no real reason for this massive slowdown but it hits me about 1/week or so it slows down and stays crazy slow on all websites for about 10 hours + then it goes back to normal for a week or two then back to hammered.

The only thing I see is a lot of name lookups and /usr/sbin/named -u bind taking up a lot of time but I’m not sure otherwise, I see root 24342 110 0.1 66156 19912 ? R 20:00 0:01 _ lfd - (child) process tracking… so is LFD going crazy? I really like LFD and CF for the autobanning features because I see hacking attacks and DOS all the time from the script kiddies

Thanks for any input I am at a loss this server and connection should not be having this issue there must be a misconfiguration somewhere.

I eneabled grey listing and everything ironed out, maybe its coincidence?

Dont know.

Yeah I don’t see anything too unusual in that process list there… whenever you ran that, were you experiencing the problem you’re describing?

If it comes up again, you could always try disabling lfd, or any other process, just to make sure that isn’t related.

-Eric

I was having the problem then i enabled grey listing and within a few minutes I was back to normal, it may have been a coincidence, Ill know in a week because it generally happens every 5-10 days.

David

This is bad, I am not going to be able to use this system in production if i can’t solve this problem. Again today the problem is a catastrophic slowdown. Any thoughts?

This is 2 weeks from the last slowdown. The small daily spikes are the midnight backups, the 2 HUGE spikes are … well i dont know the problem comes and goes…

Seems there could be a correlation between mailq and my problem

2nd view of CPU

Howdy,

You mentioned a correlation between this CPU issue and mailq… whenever this occurs, how many messages are showing up in your email queue?

-Eric

well it seemed like there was a correlation but only 35 were in there but then i deleted the whole queue and it was still slammed, I rebooted and it ironed itsself out but it was so bad stats on my server didn’t even record for several hours, I have a big white gap in my stats for that time period where it was very bad, it seems like this just happens every 2 weeks or so, very strage, I’m trying to figure this out before i put two of my major clients websites on this box any help is appretiated.

David

I’m having trouble remembering what all we tried – but it might be worth verifying that in Email Messages -> Spam and Virus Scanning, we’d recommend setting “SpamAssassin client program” to “spamc (Client for SpamAssassin filter server spamd)”, and “Virus scanning program” to “Server scanner (clamdscan)”.

Those settings can each make a pretty big difference, if they aren’t already set that way.

-Eric

Thanks for the reply, I did change spamassassin over to spamc and clamscan to server scanner, so far I’m

System uptime 13 days, 19 hours, 19 minutes
Running processes 230
CPU load averages 0.06 (1 min) 0.15 (5 mins) 0.13 (15 mins)

With no issues then out of the blue it will get hammered for a day. When it happens what type of log files should i be looking in do you think? The server is 16gb ram 6 cores dual processor so it should eat 17 websites and email for breakfast.

David

So after starting a support ticket the folks at Virtualmin really helped me figure it out.

Its CSF/LFD the monitoring of processes were firing off so fast that it basically started a denial of service on myself!

Long story short disable LFD, delete all the thousands of emails its dumping to root, and modify the CSF config files to increase the limits of processes to make it not monitor the threshholds too low or change the reporting interval because once the chain starts it just exponentially gets worse dragging down the whole server!