System grinding to a halt

Hi

I have 3 systems all working for 12+ months and no major issues, today 1 of them is just dieing on its feet

in /var/log/procmail.log I can see

WARNING: System limit for file size is lower than engine->maxscansize

Timeout connecting to lookup-domain-daemon.pl
Virus scanner failed to response within 30 seconds

and currently (with postfix turned off for over 10 minute) this is what I am seeing, I cannot do it with postfix on because the load goes to 65+ and rising, the box normally only is .1 to .5 load.

Tasks: 199 total, 1 running, 189 sleeping, 0 stopped, 9 zombie
Cpu(s): 3.0%us, 2.0%sy, 0.0%ni, 0.0%id, 94.0%wa, 0.0%hi, 1.0%si, 0.0%st
Mem: 1025044k total, 994448k used, 30596k free, 3580k buffers
Swap: 2031608k total, 1568416k used, 463192k free, 36740k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7360 thomas 18 0 22052 6200 4544 D 1.0 0.6 0:00.04 php-cgi
7363 thomas 18 0 22048 6084 4500 D 1.0 0.6 0:00.03 php-cgi
6999 just-ser 18 0 116m 111m 992 D 0.7 11.1 0:05.35 clamscan
6494 steven.j 18 0 83548 18m 644 D 0.3 1.8 0:04.47 clamscan
7022 576 18 0 79968 42m 820 D 0.3 4.2 0:03.67 clamscan
7208 root 18 0 0 0 0 Z 0.3 0.0 0:01.18 collectinfo.pl <defunct>
7347 root 15 0 2324 1080 804 R 0.3 0.1 0:00.15 top
7354 tblogs 18 0 22048 6080 4436 D 0.3 0.6 0:00.04 php-cgi
1 root 15 0 2064 408 384 S 0.0 0.0 0:00.57 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/0
6 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
7 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
10 root 10 -5 0 0 0 S 0.0 0.0 0:00.97 kblockd/0
11 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
99 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0
102 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
104 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
169 root 10 -5 0 0 0 D 0.0 0.0 0:03.55 kswapd0
170 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
322 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused
345 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0
346 root 16 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux
349 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
350 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1
363 root 16 -5 0 0 0 S 0.0 0.0 0:00.00 ksnapd
366 root 10 -5 0 0 0 D 0.0 0.0 0:01.10 kjournald
393 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kauditd
427 root 21 -4 2240 324 320 S 0.0 0.0 0:00.38 udevd
1208 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kmpathd/0
1230 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 kjournald
1669 root 16 0 1716 420 376 D 0.0 0.0 0:00.20 syslogd
1672 root 15 0 1668 288 284 S 0.0 0.0 0:00.00 klogd
1698 named 25 0 41288 3232 1132 S 0.0 0.3 0:01.11 named
1734 dbus 16 0 2748 296 292 S 0.0 0.0 0:00.00 dbus-daemon
1756 root 25 0 9380 652 532 S 0.0 0.1 0:00.02 automount
1780 root 18 0 1668 344 340 S 0.0 0.0 0:00.00 acpid
+++++ many more lines

I can so far see not issues in any other logs files (still looking)

System is the Centos 5 Virtualmin fully updated approx 30 domains

Any clues ?

Thanks for any help/clues you can give.

Denis

Do you have a lot of email in your mail queue?

If you type "mailq", what does the last line of output say?
-Eric

Tasks: 199 total, 1 running, 189 sleeping, 0 stopped, 9 zombie

the first thing i would solve is killing the zombie processes. they can really mess up the chi on a server.

You probably want to switch to the daemon version of clamav and clamdscan (in the Email Messages->Spam and Virus Scanning page). But, that may or may not resolve this issue. It seems like the problem is that clam is responding too slowly, but that may be an illusion.

Also, there was an update to clamav just a couple of days ago, which may be slower than the older version. clam tends to get slower and bigger with every release (which makes sense, I guess, since it’s scanning for more viruses in every release), and maybe this one pushed it off the deep end for your system and workload. clamd would resolve that.

Sorry, I forgot to post a link saying as much, but this one was fixed up in the Bug Tracker :slight_smile:

http://www.virtualmin.com/index.php?option=com_flyspray&Itemid=82&do=details&task_id=5813&Itemid=82&project=1&status[0]=

It was indeed clamscan vs clamd, moving to clamd fixed it up.
-Eric

Hi Eric

Are you able to track this bugtracker item down as the link doesn’t seem to work anymore. We seem to have the same problem on one of our machines where it appears spamassassin just isn’t altering the headers of the emails. In the procmail.log we are seeing the engine->maxscansize error

Dave

Yeah, it’s tough finding those old bugtracker issues :slight_smile:

However, based on the notes I had written in this forum thread, it looks like the issue had been solved by moving to the ClamAV daemon, rather than using the command line scanner.

You can make that change in Email Messages -> Spam and Virus Scanning, and set “Virus scanning program” to “Server Scanner”.

-Eric