I apologize in advance if this is not the right section as this might not be a Virtualmin specific problem.
We noticed that recent reboots are taking quite some time (est 20 mins) and uptime shows load of 20 to 30 times once the server is rebooted.
We have around 200 virtual servers on PHP 7.4, running on nginx. Server has 32GB RAM, swap disk of 8GB. 4 core CPU.
Most of the sites are running Wordpress. Opcache is enabled with opcache.validate_timestamps = 0.
From our checks, it seems each site’s php-fcgi is probably trying to load the opcache into memory and this could be the cause of high CPU. The load will go down drastically if we stop nginx or if we wait for around 30 mins (assuming somehow the cache is warmed up).
Another suspicion is due to slow disk I/O as we could see high %wa when running top command. Checking via ps auxf | grep " D" also shows most of it is related to the php process.
Is there a setting in Virtualmin for each site to load in sequence? We are considering to disable opcache totally but not sure if that would help. Increase CPU to 8 core is another consideration too.
One more thing to add, on a regular with low load, sometimes loading the Virtualmin dashboard will also cause a CPU spike. This is pretty random which we are not able to pinpoint the cause.
disabling opcache in half the sites perhaps to check if that reduces boot time in half, could be a good test…
or moving to php-fpm with ondemand pm…
and i think you should increase to 8 cores for 200 sites anyway…
We tried increasing the swap size and also increased CPU core to 8. It seems to help quite a bit.
Though at times we still face some random slowdowns / CPU spike while switching between Virtualmin/Webmin pages. That is probably a separate issue that we will need to check.
If you’re already consuming swap immediately on boot, you don’t have enough memory for everything you’re running. “swap left” isn’t meaningful data, but having data being swapped out during boot is a red flag.
Apologies for not adding some information. From the previous update our RAM stays at 32GB and swap increased to 16GB.
During reboot swap is not used at all. RAM is also healthy. 5GB used out of 32GB.
The monitoring we did is over 3 to 4 weeks of uptime, where swap is utilized over time. In the above scenario, swap was down to 1GB left.
We restarted some of the php-fcgi processes. Then checked in top command. RAM shows around 86% usage. Swap is 45% usage. However, things did not improve. It’s just that we kept noticing the same few list when running the ps command.
Just wondering what’s happening when the miniserv.pl is running in case it can point to us something (e.g. maybe SSL renewal via LetsEncrypt etc).
But, you’ve only shown us processes that aren’t using any CPU (that ps list is showing 00:00 CPU used time…so, basically nothing).
I can’t even begin to guess what’s happening with the info you’ve given. Why are you just showing processes waiting on something? (D is for uninterruptible sleep, man page says “usually I/O”). All of those processes have used basically no CPU (that 0:00 is how many minutes and seconds of CPU they’ve consumed…none, they are not your high load). At this point I can’t even say what processes are using CPU…because it isn’t the Webmin processes you’ve shown us.
Just look at top. It sorts processes by CPU usage by default.