cpu idle always shows 0%

midol · January 27, 2009, 3:05pm

I am trying to get a handle on the system load imposed by a particular site on the host computer where I’m using VM GPL 3.65. Top and the sysstat logs show 0% cpu idle at all times. This seems implausible. The site is quite low traffic now but will go into high gear in a couple of months. Does anyone have an idea to perhaps use some other tool? Or is this a known problem?

Dave

Eric · January 27, 2009, 3:30pm

Hmm… so, what kind of hardware do you have in your server? And is it a VPS, or a dedicated machine?

When you run top, do you see processes there chewing up a lot of resources?

How much free memory do you have (which you can see when typing "free")?

And what is the load you see when typing "uptime"?

Things like having an underpowered server, or not enough memory, can cause problems like what you’re seeing. OTOH, so can a mis-behaving ClamAV daemon.

The above questions will help dig up the source of the issue here. Thanks!
-Eric

midol · January 27, 2009, 6:10pm

The cpu is a 1GHz Pentium and there is 512 megs of RAM, the max for this setup. The system is real (not virtualized) and dedicated to site hosting, all system resources are available as far as I know. Top shows that top is taking 45% of the system, but that necessarily is when top is running, so I don’t exactly see how to interpret that given that it isn’t normally running. I see rpmq in second place with 42% of cpu. Waiting a while shows rpmq dropping off the list then top is taking up 87% of cpu. Free:

[admin@cserver ~]$ free
total used free shared buffers cached
Mem: 514404 370608 143796 0 13280 145164

[admin@cserver ~]$ uptime
14:07:51 up 77 days, 22:38, 3 users, load average: 1.97, 2.30, 2.22

All of this seems normal to me except for top and that seems ambiguous.

Your suggestions much appreciated. Any further ideas?

Eric · January 27, 2009, 6:20pm

Yeah, something is chewing up your CPU.

I’m a little suspicious of that rpmq process, though I’m reluctant to suggest killing it before getting more info.

Would you be able to include the output of a "ps auxw" as an attachment to a post in this thread?
-Eric

Joe · January 27, 2009, 6:33pm

There is a buggy version of RPM out there that explodes occasionally. It existed for CentOS/RHEL…I don’t recall whether it was 4 or 5, though. This might be the culprit here. You haven’t identified your OS, but perhaps that’s the cause here.

If these processes continue to eat CPU for twenty or thirty minutes, kill them. You may then need to cleanup the lock files in /var/lib/rpm ( the files to remove are named __db.00* ), or rpm might go straight back into a locked state. I’m unsure on this one–this locking issue may be unrelated to this rpmq bug (I’ve seen a handful of RPM bugs over the years, and they’ve all run together).

Once you’ve got a sane system again, and yum is working, use it to update the rpm package. If your problem is that buggy RPM version, then an update will fix it. I’ve seen this specific problem several times in the past few months, so it seems a likely culprit.

midol · January 27, 2009, 7:31pm

The OS is Centos 5.2. I run updates every day so as far as I can see everything IS up to date. The first few lines of top, run just now, are:

18121 root 25 0 2456 1136 732 R 90.8 0.2 347010:18 top
15820 admin 15 0 61576 27m 5208 S 3.4 5.5 4:08.14 nxagent
17002 admin 15 0 195m 42m 17m S 2.3 8.5 1:51.96 firefox
19489 admin 15 0 41788 10m 7508 S 1.1 2.0 0:08.01 gnome-terminal
26706 admin 15 0 2456 1228 804 R 1.1 0.2 0:00.18 top
15866 nx 15 0 2676 484 464 S 0.2 0.1 0:04.68 nc
16011 admin 15 0 28744 6916 5940 S 0.2 1.3 0:39.72 clock-applet
21484 bvmapcha 15 0 14936 3100 2848 S 0.2 0.6 0:05.52 pam-panel-icon

ps output is attached.

Dave
[file name=psauxw.txt size=47395]http://www.virtualmin.com/components/com_fireboard/uploaded/files/psauxw.txt[/file]

Eric · January 27, 2009, 7:58pm

Aha!

It looks like while you may be running Apache and friends on that machine, that it’s also running an entire suite of desktop applications. It has X, GNOME, firefox, and the nomachines agent on it.

That’s about the quickest way to bring a 1Ghz server with 512MB of RAM to it’s knees

If you’re interested in using that system as a server, my recommendation is to stop running X and any related desktop applications on it, and to use it purely for server functions.
-Eric

midol · January 27, 2009, 8:11pm

Thanks,

The server setup is premised on remote admin access. The freenx server needs to be there so that I can get access remotely. Normally nothing is done with any of the installed desktop apps because there is not normally anyone using the desktop or X or the other usual personal use bits. But when I log in using the nomachine client from another computer the machine starts up an X session just for me, no doubt skewing the appearances. When I end the login I always end the X session rather than suspending it, specifically so as to not leave useless processes running.

Anyway, the performance of the server with static pages and half a dozen sites is currently ok, this checking is just precautionary on my part.

Something I wonder about though is the time+ stat in the first line of top showing results for the top process, the figure given is 347K, orders of magnitude greater than for anything else. I don’t know what to amke of that.

dave

Eric · January 27, 2009, 9:08pm

Hrm, where to start

I would highly recommend using SSH for remote admin access, rather than something that requires an X session be launched. And if you aren’t a command line fellow, Virtualmin can help you manage things there using it’s web interface.

I don’t think the processes are shutting down when you log out as you’d like them to.

You currently have nearly 500 processes running on your server – about 300 of those are related to X/freenx – many of which have been running since back in 2008.

That’s the source of your CPU issues

And that “top” process you see running there – that’s been running since last year as well!

So again, I’d really recommend using a combination of SSH and Virtualmin for remote administration.

Launching X on a 1Ghz machine with 512MB of RAM requires more resources than that system has available to it – and X not closing out when it’s done isn’t helping the matter
-Eric

Joe · January 27, 2009, 9:17pm

The server setup is premised on remote admin access.

Aren’t they all?

Something I wonder about though is the time+ stat in the first line of top showing results for the top process, the figure given is 347K, orders of magnitude greater than for anything else. I don't know what to amke of that.

I wouldn’t make anything of it. top isn’t your problem, since it only runs occasionally, so don’t worry about what it’s doing.

midol · June 7, 2009, 12:37pm

Thanks to all who replied. I was concerned about the exported X-session skewing the results and it appears that this was in fact what was happening. I rebooted the server and now get this:

top - 10:32:04 up 5:52, 2 users, load average: 0.03, 0.14, 0.16
Tasks: 121 total, 1 running, 120 sleeping, 0 stopped, 0 zombie
Cpu(s): 15.6%us, 1.1%sy, 0.6%ni, 82.3%id, 0.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 514404k total, 368848k used, 145556k free, 68816k buffers
Swap: 1048568k total, 0k used, 1048568k free, 136940k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13576 root 15 0 2188 932 712 R 3.9 0.2 0:00.02 top
1 root 15 0 2060 644 556 S 0.0 0.1 0:01.51 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 39 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root 10 -5 0 0 0 S 0.0 0.0 0:00.01 events/0
6 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
7 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
10 root 10 -5 0 0 0 S 0.0 0.0 0:00.13 kblockd/0
11 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
85 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0
88 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
90 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
150 root 25 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
151 root 15 0 0 0 0 S 0.0 0.0 0:00.64 pdflush
152 root 10 -5 0 0 0 S 0.0 0.0 0:00.18 kswapd0

So far fewer processes and very low processor contention. These results from using ssh rather than the nomachine access. It is a little disconcerting to see a single x-session responsible for such a large fraction of the machine’s resources but then that isn’t really the machine’s purpose and it seems to work fine for serving pages. And now I know more about monitoring. Case closed I think.

Dave