Apache 2.2.15 + mod_fcgid 2.3.7 (default Virtualmin installation) graceful restarts generate error both in browser and error log

gpetrov · September 10, 2013, 1:52pm

Hi all,

It was more than 9 months ago I discovered a problem with the graceful restarts on a default Virtualmin installation with the default execution mode (FCGId), but recently I had the time to dig deeper and experiment.

What is the setup:

CentOS 6.4 x86_64 minimal installation
Virtualmin 4.02.gpl GPL installed by the automatic .sh script, all default settings
mod_fcgid.x86_64 2.3.7-1.el6 from the virtualmin repo
httpd.x86_64 1:2.2.15-29.el6.vm.1 from the virtualmin repo
Single virtual server, running under the default FCGId execution mode, with the default of 90 sec php execution time
Single test.php file containing


<?php
for($i = 1; $i <= 30; $i++) {
	echo $i."\n";
	sleep(1);
}
?>

What is the error:

Run the script via browser, then go and do a graceful restart on apache (service httpd graceful). After around 12 seconds you are going to see “No data received” error in you browser (Chrome) and the following in the apache error log for that virtual server:


(22)Invalid argument: mod_fcgid: can't lock process table in pid 25570

(the pid number will be different)

Further experiments show that this script gets forcefully killed before ending.

If you reduce the time the script executes to 5 seconds ($i <= 4), you’ll get the same result, this time after 5 seconds.

Further experiments show this process completes, but you still get the errors both in the browser and the error log.

Do you get the same error? Test and post it here.

Dig:

It is actually a problem of mod_fcgid, not Virtualmin itself.

Graceful restarts are performed every time a virtual server is installed, deleted, or the settings concerning Apache are changed. On a shared hosting environment this could be every 2 minutes. Even if it is not shared hosting this is still pretty often. Every time you do a graceful restart (install/remove server) all the running processes will get killed or at least you’ll get scary error in the browser.

The first experiment tweak was to add a file write at the end of the script which shows which script completes and which gets killed before that. I got the result above.

Add this inside the loop:


file_put_contents("test.txt", "test run for: ".$i." seconds");

So why 12 seconds and where is this set. After some time I discovered that increasing FcgidErrorScanInterval to 60 will let the second process to complete (but still you get the errors).

If you check the code of mod_fcgid In fcgid_pm_main.c, the graceful restart should be performed by the function kill_all_subprocess() but obviously the scan_errorlist() is also executed even if there is a check for procmgr_must_exit(). The code is really messy, even if it is not very complex, I didn’t quite understand it.

The error in the log “can’t lock process table in pid 25570” probably means that some information about the process is destroyed immediately upon the graceful restart, so we will never get the result back.

Even if we get around the early termination of the processes increasing FcgidErrorScanInterval the second problem is actually bigger - all your users are going to see this error.

Do you get the same?
So far I can propose to:

Try to fix this problem and deploy custom version of the mod_fcgid
Having in mind that mod_fcgid cannot share the APC cache and is probably doomed long term, push the php-fpm + Apache implementation. Both Apache 2.2 and 2.4 are possible.

Thanks for your time testing and commenting!

eddieb · September 30, 2014, 6:13pm

I am still experiencing the problem with mod_fcgid 2.3.9 and apache 2.2.15. Graceful apache restarts leaves php processes dangling. This was suppose to have been fixed in 2.3.7 (https://issues.apache.org/bugzilla/show_bug.cgi?id=50309)

I am waiting for apache 2.4 to use php-fpm.