High CPU on reboot

darnpunk · November 9, 2022, 5:44am

SYSTEM INFORMATION
OS type and version	CentOS 7
Virtualmin version	7.3
RAM	32GB
Swap	8GB
CPU	Intel Xeon Silver 4210 CPU @ 2.20GHz, 4 cores

Hi,

I apologize in advance if this is not the right section as this might not be a Virtualmin specific problem.

We noticed that recent reboots are taking quite some time (est 20 mins) and uptime shows load of 20 to 30 times once the server is rebooted.

We have around 200 virtual servers on PHP 7.4, running on nginx. Server has 32GB RAM, swap disk of 8GB. 4 core CPU.

Most of the sites are running Wordpress. Opcache is enabled with opcache.validate_timestamps = 0.

From our checks, it seems each site’s php-fcgi is probably trying to load the opcache into memory and this could be the cause of high CPU. The load will go down drastically if we stop nginx or if we wait for around 30 mins (assuming somehow the cache is warmed up).

Another suspicion is due to slow disk I/O as we could see high %wa when running top command. Checking via ps auxf | grep " D" also shows most of it is related to the php process.

Is there a setting in Virtualmin for each site to load in sequence? We are considering to disable opcache totally but not sure if that would help. Increase CPU to 8 core is another consideration too.

One more thing to add, on a regular with low load, sometimes loading the Virtualmin dashboard will also cause a CPU spike. This is pretty random which we are not able to pinpoint the cause.

stefan1959 · November 9, 2022, 9:37am

Wow 200, and I was worried about 10.

dimitrist · November 10, 2022, 1:14pm

disk health? maybe a disk is failing?

disabling opcache in half the sites perhaps to check if that reduces boot time in half, could be a good test…
or moving to php-fpm with ondemand pm…

and i think you should increase to 8 cores for 200 sites anyway…

darnpunk · November 27, 2022, 3:25am

We tried increasing the swap size and also increased CPU core to 8. It seems to help quite a bit.

Though at times we still face some random slowdowns / CPU spike while switching between Virtualmin/Webmin pages. That is probably a separate issue that we will need to check.

Joe · November 27, 2022, 3:43am

Increasing swap cannot improve performance. All it can do is prevent out of memory situations (i.e. OOM killer kicking in).

But, more cores can’t hurt.

But, I think your suspicion about disk performance is probably the most likely explanation.

darnpunk · December 19, 2022, 4:23am

We did some monitoring and it seems the pattern for the slowdowns is pretty similar. The steps are something like this:

Check CPU load via uptime and top is normal. Browsing of sites, DB query all looks ok.
Load Virtualmin / Webmin via the browser
Dashboard loads our default page (i.e. the list of Virtual servers). CPU spikes a bit but still within normal range.
Switched to one of the domains in the Virtualmin list to try view configuration
Then things start to slow down and CPU load via uptime goes as high as 40+. Swap disk still has space left.

I ran ps command and noticed the following:

ps auxf | grep " D "

root      53752  4.1  3.2 1299452 1054116 ?     D    12:05   0:20  \_ /usr/libexec/webmin/authentic-theme/index.cgi
root      54800  0.5  0.0  79616  4700 ?        D    12:13   0:00  |       \_ /opt/rh/rh-php72/root/usr/bin/php-cgi -h
root      54798  0.5  0.0  79616  4700 ?        D    12:13   0:00  |       \_ /opt/rh/rh-php72/root/usr/bin/php-cgi -h
root      54310  0.0  0.0 261576 32308 ?        D    12:08   0:00  \_ /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
root      54331  0.0  0.0 261576 32272 ?        D    12:08   0:00  \_ /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
root      54480  0.0  0.0 263652 32748 ?        D    12:10   0:00  \_ /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
root      54513  0.0  0.1 263652 32768 ?        D    12:11   0:00  \_ /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
root      54522  0.0  0.0 263652 32720 ?        D    12:11   0:00  \_ /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
root      54784  0.1  0.1 261576 32848 ?        D    12:13   0:00  \_ /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
root      54787  3.2  0.1 283196 49232 ?        D    12:13   0:00  \_ /usr/libexec/webmin/authentic-theme/xhr.cgi

There’s a list of /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf and it can sometimes grow longer than the above.

CPU starts to come down as soon as the list above clears. What are those scripts doing? Is there somewhere we need to check or clear some cache maybe?

Joe · December 19, 2022, 4:44am

If you’re already consuming swap immediately on boot, you don’t have enough memory for everything you’re running. “swap left” isn’t meaningful data, but having data being swapped out during boot is a red flag.

Randomz · December 19, 2022, 6:48am

More RAM and switch to SSD if you are using spinning disks.

darnpunk · December 19, 2022, 9:22am

Apologies for not adding some information. From the previous update our RAM stays at 32GB and swap increased to 16GB.

During reboot swap is not used at all. RAM is also healthy. 5GB used out of 32GB.

The monitoring we did is over 3 to 4 weeks of uptime, where swap is utilized over time. In the above scenario, swap was down to 1GB left.

We restarted some of the php-fcgi processes. Then checked in top command. RAM shows around 86% usage. Swap is 45% usage. However, things did not improve. It’s just that we kept noticing the same few list when running the ps command.

Just wondering what’s happening when the miniserv.pl is running in case it can point to us something (e.g. maybe SSL renewal via LetsEncrypt etc).

Joe · December 19, 2022, 9:37am

OK, it’s not memory, then.

But, you’ve only shown us processes that aren’t using any CPU (that ps list is showing 00:00 CPU used time…so, basically nothing).

I can’t even begin to guess what’s happening with the info you’ve given. Why are you just showing processes waiting on something? (D is for uninterruptible sleep, man page says “usually I/O”). All of those processes have used basically no CPU (that 0:00 is how many minutes and seconds of CPU they’ve consumed…none, they are not your high load). At this point I can’t even say what processes are using CPU…because it isn’t the Webmin processes you’ve shown us.

Just look at top. It sorts processes by CPU usage by default.

system · February 17, 2023, 9:37am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.