The fearsome "'can't apply process slot for (...)fcgi-bin/php5.fcgi" error :(

EcchiOli · January 27, 2014, 9:43pm

Hello !

I have a problem between virtualmin and my new server, and I hope that, maybe, you could help me with it
I tested all I could, investigated as far as my googling allowed me to, but now I’m stuck. If you know what I could do, where I could investigate, I would be EXTREMELY grateful !

If I open a dynamic page in one of the sites I created on the new server, I have, in the browser, after a long wait :

Service Temporarily Unavailable The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

In the apache error_log :

[warn] [client 92.142.9.191] mod_fcgid: can't apply process slot for /home/olivertest/fcgi-bin/php5.fcgi

(Or, the same, for every other website that I tested it for)

The context : until last month, I had a relatively average server, on which I had debian squeeze, webmin and virtualmin, and everything worked fine, I could add sites, all was cool.
I moved to a new server, a powerful beast (two SSDs for / in raid-1, two SATAS for /home in Raid-1, the CPU is an AMD Opteron 4334, 6 cores, 3.6 Ghz. There is 32 GB of RAM).
On this new server, the same tech guy installed the same software with the same settings, debian (this time Wheezy, the new one), webmin, virtualmin, suhosin, with FastCGI (FCGId) as handler.
And then… all hell broke loose.

On my server, save two sites that work correctly (maybe because they’re sitting alone on their own IP, or because they were the two that were imported first), all the other sites as as good as dead.
The problem : static files are OK, dynamic files using PHP aren’t OK.

It’s the same, whether I import a virtualmin backup of a site or delete it and re-create it from scratch.

Following http://www.megalinux.net/mod_fcgid-cant-apply-process-slot/,
I checked that the directory in which Apache kept the socks for fcgid (/var/lib/apache2/fcgid/sock , on my debian wheezy) had the proper 755 permissions.
It already had these.

Just in case, I turned off fail2ban and restarted Apache… no change.

Maybe a server limitation problem ? I checked, and my server is VERY FAR from hitting its hardware limits.
the Top command shows all is OK, the most extensively used one is the CPU, around 20%.
If you want to look at a Top : http://imgur.com/NY0VvPP
Munin graphs show :
the Apache processes limit is 600, with production peaks at 300.
The CPU usage, with a maximum of 600% (6 cores), adding system, user and rare iowait, hardly amounts to 100% at worst, with everything else being Idle.
RAM usage is peaceful, with most of the RAM either unused or used for cache (I’d made plenty of restarts lately because of my tests, it would take 3 or 4 hours to have all spare ram becoming cache instead of unused.)

I thought I would check the system configuration files, to see if there was an issue :

My /etc/apache2/apache2.conf file (whose contents I’ve seen mentioned as httpd.conf in older discussions, I suppose it was before Apache2) mentions :

Timeout 300 KeepAlive On MaxKeepAliveRequests 200 KeepAliveTimeout 3
(Initially, keepalive was at 100, timeout at 2 : no change)

[code]
StartServers 5
MinSpareServers 5
MaxSpareServers 10
ServerLimit 600
MaxClients 600
MaxRequestsPerChild 1000

StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxClients 150 MaxRequestsPerChild 0 StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxClients 150 MaxRequestsPerChild 0 [/code]

My /etc/apache2/mods-available/fcgid.conf mentions :

<IfModule mod_fcgid.c> AddHandler fcgid-script .fcgi FcgidConnectTimeout 60 MaxProcessCount 9 MaxRequestLen 33554432 </IfModule>
(Initially MaxProcessCount was 3, timeout was 30 : increasing it made no difference.)

I uploaded to pastebin the php.ini of a site having the problem :
http://pastebin.com/t88GNvN1

Some odd things, also. Coming back from the outside (bringing my kids home from school), I saw a site was finally working, I thought, thanks to the tests I made in its php.ini.
Later on, for more testing, I restarted Apache (“service apache2 restart”).
Guess what ? The site that used to be working again, had once again ceased to be served. Four hours later, no change is to be seen.

I carefully, gradually raised the limits, sometimes lowering them, to no change. I either imported virtualmin backups from the previous server, or deleted everything, recreated from nothing, and reuploaded files by SFTP. No difference.

In Virtualmin > Virtual Server > Server Configuration > Website options, I tested that the bug was GONE if I chose mod_php or CGI wrapper as php handler. But, according to http://boomshadow.net/tech/php-handlers/ , this is NOT secure outside of Fcgid, sob.
Because, switching back to Fcgid, the problem was back again.

Would you know what may have been wrong with virtualmin, webmin, or debian ?

Maybe, for an experienced person, it’s all making sense…

Thank you SO MUCH if you can help !

(edit : 5’30 hours after the last Apache restart, I find that suddenly one of the dead sites is working. For the moment. But not the others. And I have no idea why, I didn’t restart the server or apache.
I only found as php.ini differences that on the broken site zend.enable_gc = On isn’t uncommented, max_execution_time = 30 VS 90 on resurrected site, and session.entropy_length = 0 is a live setting while it’s commented on the rezzed site. And I still have no clue what those thigns mean, save, at least, the zend reference to the ionclube loader needed by one of the working sites on the server, but that isn’t used by anything on the dead sites.)
(Edit2 : I copied the settings of the live site to the dead site’s php.ini - Beyond Compare does marvels to save time -, and restarted apache. Now, the dead site is live again. And the site that was live is now dead again. I give up for the moment, once again, thank you SO MUCH if you’ve got a hint !)

Locutus · January 27, 2014, 11:29pm

A MaxProcessCount of 9 seems very little to me (the default is 1000!), considering it’s the global limit for FCGI processes across all your domains. If you have multiple sites running, you’ll hit that limit very quickly. All further PHP requests will then receive the error you witnessed.

Try commenting out that directive and see if it makes a difference.

Further info about the most relevant directives is here: http://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html#fcgidmaxprocesses

Note that “MaxProcessCount” is an old name for the current “FcgidMaxProcesses”. It does make sense to somewhat limit the number of processes that fcgid can create (it needs one for each request that it serves concurrently, and will keep a certain number of spare ones lying around also when they’re idle), to prevent your system from being overloaded, but 9 total is definitely too few, especially on your powerful machine.

Also note that Virtualmin does not put MaxProcessCount in there by default, that must be a change that was either made manually or came from the mod_fcgi package of your OS. It’s not present in my Ubuntu systems, only “FcgidConnectTimeout” is there with a value of 20.

back5150 · February 24, 2014, 8:35pm

I ran into this same ‘can’t apply process slot for /home/directory/fcgi-bin/php5.fcgi’. Checked that process/hardware limits were OK, the number of connections was normal for this site, no new code added to site.

Verified permissions on /var/run/mod_fcgid were set to 755 and owned by apache. My /etc/httpd/conf.d/fcgid.conf contains the following locations:

FcgidIPCDir /var/run/mod_fcgid
FcgidProcessTableFile /var/run/mod_fcgid/fcgid_shm

While /var/run/mod_fcgid is owned by apache, the process table file (fcgid_shm) was owned by root. I changed the owner of that file to apache and the site began to operate correctly. A few minutes later, I started getting the 503 error message and the same ‘can’t apply process slot’ message in the logs. Repeated the chown procedure and again a few minutes later 503 message. I’ve setup a cron job to chown -R apache:apache /var/run/mod_fcgid folder, but I’d like to get to the root cause of this issue.

System Information:

OS: CentOS Linux 6.5 [2x Quad Core CPUs, 8GB RAM]
Kernel & CPU: Linux 2.6.32-431.5.1.el6.x86_64 on x86_64
Webmin version: 1.675
Apache version: 2.2.15
PHP version: 5.3.3

Please let me know what additional information I should add.

yngens · May 15, 2015, 12:15pm

Hitting the same issue - fcgid_shm keeps reverting to root’s ownership. But am not sure if it should belong to apache user and group at all.

geissweb · January 12, 2016, 10:25am

Hi there,

Currently I have the same problem. Did you ever find a solution for this?

When changing to mod_php (run as Apache’s user) in website options, it works (can execute PHP).

I think this could be a permission problem where the fcgi stuff has wrong permissions set. Any recommendations which files should have which permissions?

geissweb · January 12, 2016, 12:35pm

OK I solved it for me. In case someone else has the same problem:

Are more detailed look in the logs showed:

/var/log/apache2/error.log:
[Tue Jan 12 13:28:44.788076 2016] [fcgid:error] [pid 17868] (13)Permission denied: mod_fcgid: couldn’t bind unix domain socket /var/lib/apache2/fcgid/sock/

I then gave /var/lib/apache2/fcgid and /var/lib/apache2/fcgid/sock 777 permissions to test. This lead to the next error:

suexec policy violation: see suexec log for more details

In suexec.log, I then got:
[2016-01-12 13:29:05]: User username not allowed: Could not open config file /etc/apache2/suexec/username

Solution was finally:
cp /etc/apache2/suexec/www-data /etc/apache2/suexec/username

Afterwards I think this can happen, if you have some hosts using mod_php to execute PHP and some hosts with fcgid.

data1 · December 23, 2016, 8:27am

Hey, i have the same problem. My CPU is very high… and the site has loading slow. I’ve got this error
“[Fri Dec 23 09:19:20.134730 2016] [fcgid:warn] [pid 13295] [client 191.96.249.53:38860] mod_fcgid: can’t apply process slot for /home/data0/fcgi-bin/php5.fcgi”.
It;s look like a hacking. The IP 191.96.249.53 comes out Russian… and he’s hacking i think?! A’m i wrong? Let me know! And let me know when you have a solution!

Masplus · December 30, 2016, 8:36am

Hello,

I think we can work the two problems with some tweaks…

can't apply process slot

You’ve tried the option that commented user “Locutus” in the pots #2 ? It seems to be the solution. Try it and tell us if this solve your problem ( remember to restart apache to verify that server is considering the new values ).

Try to avoid site attack with bruteforce

If you think you’re get attacked by some IP, you can try to install “Fail2Ban” service and configure it for your hosting website type ( most common, for example, WordPress ). But first i reccomend to read your error_log and access_log for that particular host to verify if the same IP is requesting the same path or URL.