I have installed Virtualmin on Centos 5.5 and kept it up to date. I have written a script to keep track of zombie processes, and lately I have been noticing miniserv.pl go defunct every 5 minutes or so.
Is this normal? What approach should I take to resolve this issue? What about the web server and php-cgi?
I tried changing the settings for fast cgi, but it hasn’t helped:
Here is the zombie.log:
Fri Jan 21 16:33:01 EST 2011
Z 3919 20621 [httpd]
Sat Jan 22 04:27:39 EST 2011
Z 20616 19491 [php-cgi]
Sat Jan 22 22:01:01 EST 2011
Z 3919 29739 [httpd]
Sun Jan 23 04:03:02 EST 2011
Z 3919 20616 [httpd]
Tue Jan 25 08:50:01 EST 2011
Z 7002 32243 [php-cgi]
Tue Jan 25 11:30:02 EST 2011
Z 4101 4163 [miniserv.pl]
Tue Jan 25 11:31:02 EST 2011
Z 4101 4162 [miniserv.pl]
Tue Jan 25 11:37:01 EST 2011
Z 4101 4723 [miniserv.pl]
Tue Jan 25 11:44:01 EST 2011
Z 4101 5093 [miniserv.pl]
Tue Jan 25 11:51:02 EST 2011
Z 4101 5506 [miniserv.pl]
Tue Jan 25 11:59:01 EST 2011
Z 4101 6006 [miniserv.pl]
Tue Jan 25 12:06:01 EST 2011
/var/log/zombies.log
Hmm, that’s all pretty unusual. That sounds like you’re seeing parent processes mysteriously dieing off.
Are you by chance on a VPS, or some sort of memory constrained system? I’d be curious if you’re by chance seeing some sort of issue where, due to memory constraints, the kernel is killing processes to keep your server online.
If that’s the case, you should see messages regarding that in the dmesg output, or in the logs files in /var/log.
I haven’t been able to find anything in /var/log/messages or /var/log/dmesg that would indicate the processes are being killed. But the zombies do go away on their own.
I found these errors in the http error_log which seem to relate to the defunct httpd and php-cgi process:
[Tue Jan 25 13:38:35 2011] [notice] mod_fcgid: process /home/username/public_html/index.php(11519) exit(idle timeout), terminated by calling exit(), return code: 0
[Tue Jan 25 13:38:35 2011] [notice] mod_fcgid: process /home/username/public_html/index.php(10200) exit(idle timeout), terminated by calling exit(), return code: 0
[Tue Jan 25 13:38:35 2011] [notice] mod_fcgid: process /home/username/public_html/index.php(11353) exit(idle timeout), terminated by calling exit(), return code: 0
[Tue Jan 25 13:38:35 2011] [notice] mod_fcgid: process /home/username/public_html/index.php(11354) exit(idle timeout), terminated by calling exit(), return code: 0
[Tue Jan 25 13:38:35 2011] [notice] mod_fcgid: process /home/username/public_html/index.php(10196) exit(idle timeout), terminated by calling exit(), return code: 0
[Tue Jan 25 14:00:41 2011] [warn] mod_fcgid: process 11372 graceful kill fail, sending SIGKILL
[Tue Jan 25 14:00:47 2011] [notice] mod_fcgid: process /home/username/public_html/index.php(11372) exit(communication error), get stop signal 9
I’m still not sure what is wrong with miniserv.pl
are there supposed to be two processes running at the same time?
As far as the miniserv.pl processes – take a close look at the path to each of those. One is for Usermin, the other is for Webmin. So that much is correct, you should indeed have two running.
In your case, it looks like it’s the Webmin process where you’re seeing the zombie children.
But that again leads to the question… why are the parent processes for 2-3 different daemons dieing off. And for that, I’m not entirely certain.
The best I can offer is to closely study the logs, and try to figure out what the daemon was doing at the time, and who asked it to do that.
For example, all the Webmin actions would be listed in /var/webmin/miniserv.log. You may be able to look at when the process was started, and determine from the logs what may have happened.
For Apache – I’m curious if you’d run into less problems if you were to use CGI rather than FCGID. Some folks run into occasional issues with it… while those are different from what you’re seeing, it may still be worth trying CGI to see if some of your problems go away. You can switch the PHP Execution mode by going into Server Configuration -> Website Options.
Just some thoughts, I’m honestly not sure why you’re seeing all that though.
The httpd and php-cgi defunct processes have disappeared since I changed CGI modes. However I still keep getting defunct webmin processes for webmin 1.530 every 5 to 10 minutes on two servers (identical virtual machines). Should I file some sort of bug report?
I’ve attached my script if anyone would like to check their machine for zombie processes
It requires this line in /etc/crontab to run every minute.
Well, what you’re seeing is pretty atypical… while I don’t know why exactly, what you’re seeing is likely specific to your setup.
I say that since it seems to happen to multiple daemons on your server, as well as because I can’t reproduce that on any other systems
The two Virtual Machines you’re seeing that on – do they both run on the same host?
I’d be curious if you’d consider an experiment
If you install something like Virtualbox on your desktop, and in it, install the same Linux distro that you’re two other Virtual Machines are using. Give it a similar amount of RAM that your two Virtual Machines have.
And then, install Virtualmin onto your setup in Virtualbox, and put your Zombie hunter on there.
In that case, are you seeing any zombie processes?
If you can find a way to reproduce this issue outside of the Virtual Machine’s your other two servers are running on, you may have a case for calling this a bug (or at least, it warrants looking deeper into it). However, I have a suspicion that it’s somehow related to the configuration of those two Virtual Machines (or perhaps their underlying host).