Hi everyone, my dedicated Fasthosts dedicated Ryzen 7 3700 Pro (8 core - 16 threads) server with 64GB memory and dual 1TB NVMe drives crashed yesterday apparently due to CPU overload caused by massive spikes in PHP-FPM processes on a number of sites running FPM, which was all of them.
It is a production web server hosting about 20 websites and none of the sites are particularly busy (my busiest site gets maybe 1.5k visitors per month).
Has anyone found this has happened recently? I don’t have any updates to run and haven’t had any issues with this server running Virtualmin in over 1 year. The Virtualmin dashboard only shows 1-15% CPU usage which is really strange as Fasthosts talked to through running HTOP, which showed some users/sites using 99% CPU usage for 1x php-fpm process…and running multiple php-fpm processes at a time.
Does anyone else know what might have caused this? I’ve attached a few htop and Virtualmin screenshots showing what is going on, but I’ve had to turn nearly all my servers off php-fpm and set them to default PHP, which is a lot slower! Snagadmin user is the only 1 I have left on php-fpm as it is the busiest site and crashed as soon as I turned FPM off!:
Compare htop, which explained the crash to the dashboard Virtualmin CPU monitor and I’m just baffled by the disconnect. With a very powerful Ryzen gaming processor, my sites have been flying and barely been touching the sides on this server!:
Thanks guys, I was pretty much forced to upgrade from CentOS 8 as all my LetsEncrypt SSL certs stopped renewing and were throwing all kinds of errors. To by honest, the Virtualmin Control Panel works pretty well and the sites have been running really well until this issue, but I can’t see that this is a Virtualmin issue as seems to relate to PHP-FPM!
Any ideas why or how a WordPress site PHP-FPM PID could possibly use 99% of an 8 physical core CPU?..also sometimes Apache also spike to near 100% CPU usage as well.
Enable slow query logging in your database and look at the log when it’s doing the CPU thing, as it might be there, so it’s worth checking.
And, of course, check the error log and the access log for the problem site to see if there are any obvious problems, and to isolate it down to one (or more than one) specific page. I’ve found some plugins do terrible things with admin pages, but aren’t so bad on user-facing pages, for example.
The value of 9999 (which represents unlimited) is a very resource unfriendly, and should never be used! Keep in mind that each child can use up to 60-70 MiB of RAM. Set this to a reasonable value using PHP Options page (or just edit this pool file manually and restart PHP-FPM service), depending on your needs and resources available.
The server has 64GB of RAM, which it’s not touching the sides of, but I have now dropped it from 9999 to 20 (as agree that’s just unneccessary), however, it’s the CPU that’s still spiking up to 80-90% for a single php-fpm process on some occassions, which is what caused my site to crash last week!
I’ve turned off PHP-FPM on a load of my servers, but this has caused knock on effects on some, so I’ve had to re-enable it.
If it makes any difference, on my Hetzner dedicated dev server running CentOS 7, I get an extra option for PHP script execution mode of FCGId, which seems a lot more stable than FPM, but this option isn’t there on my server above. Is this something I can install now?
I usually avoid dynamic unless the server is hosting just a site or two. Based on your server specs and the number of WP sites you’re hosting, I would recommend switching to static instead. I have servers with 2GB-4GB RAM (2core - 4core) handling 25+ WP sites on each and they’re quite stable.
A single php-fpm process using 78% CPU on a Ryzen server with 8 core is weird af tbh.
Thanks for the suggestion @shillongserver . Sounds like exactly what I need as for some reason even snaggingcompany.com’s Dev site on my CentOS 7 server is doing the same, so it’s not really a CentOS 8 Stream issue!
I’ll try this out and also set the max memory usage per process to 128M as currently they are also set to 512M, which might also be where I’m going wrong.
Having tested it this morning, most sites are fine with static, apart from snagging company, which throws 500 errors and crashes as soon as i set this and a few others have been fine on their homepage, but crashed with 500 error on other pages!
Setting max children to 20, max memory per process 256MB with max times set to 180 seconds.