Hi!
I’m using Virtualmin Pro on a freshly installed Debian 4.0 machine since June, and since then it already happened twice that the machine ran out of memory twice.
The swap usage climbs slowly but steadily over the weeks, and at some point, within a couple of hours, the situation escalates and the swap gets filled pretty fast, resulting in the system basically hanging up.
I suspect this has something to do with Virtualmin, since I don’t use any other software I didn’t use before, when I had Plesk.
Anyway, the problem is: how can I find out what is causing this?
I can’t afford that this continues…
How much memory does your box have?
The full stack of Virtualmin applications is quite large…mail processing, in particular, is quite heavy.
You’ll want to have a look at the “Virtualmin on low memory systems” guide found here:
http://www.virtualmin.com/documentation/id,virtualmin_on_low_memory_systems/
I’m unaware of any memory leaks in any of the software Virtualmin manages (or Virtualmin/Webmin itself). Virtualmin.com ran for about a year before its most recent reboot, and it’s been up for 103 days now. Obviously, we’re running Virtualmin on it.
a good starting point might be identifying what that zombie process is and stop it.
yeah but how do i to that?
the onl thing i could think about is catting ps aux to a file each hour or so, via script… so that i have some numbers
is there any better way?
hm… maybe, for some reason, too many apache and php processes are started? i should check whethere there are some limits to that…
Yeah, 2GB should be plenty of RAM.
When you see your swap starting to get used, would you consider running a "ps auxw" and pasting it in here (or better, attaching it as a file)? Perhaps someone will see something that sticks out as a potential source of the problem.
How many domains are on your system, and roughly how many emails are being received each day?
If your hunch is right about it being a problem with Apache/PHP, whenever your swap usage begins growing, try restarting the Apache daemon. If that clears out all your RAM, that means it (or some related program therein) may be the culprit.
-Eric
from wiki
Zombies can be identified in the output from the Unix ps command by the presence of a “Z” in the STAT column.
Zombies that exist for more than a short period of time typically indicate a bug in the parent program.
As with other leaks, the presence of a few zombies isn’t worrisome in itself, but may indicate a problem that would grow serious under heavier loads
so, run ps, note the ID
find the parent process, this could be done in webmin-running processes-click on the ID and files and connections/trace process.
The bug in the parent program might cause your swap being filled.
if you’re running mod_fcgi then you can limit the child processes to, let’s say 2
Else check the httpd.conf to change the settings.
The zombie process seems to indicate though there is a script running on your server that has a bug. This would be the first place I would look.
Thanks for your precious replies.
Ok, so here some of the info you requested:
- The server has about 70 domains, plus several subdomains
- About the amount of mail… from looking at the graph i would say it is about 1 message every two minutes, however i’ve got greylisting, spf and some other measures that kick in first, which elminate lots of messages.
I noticed that when upgrading virtualmin/webmin, which involves their restart, several hundred megs get freed from swap space.
From this morning i notice no much difference, actually memory usage has dropped a little bit, so things are fine… for now. I will keep monitoring though.
Talking about email, I used to have spamassassin standalone and clamscan. After my first memory fillup I switched clamscan to clamdscan. I don’t know if it’s a coincidence, but the second memory fillup came later. I will try to switch to spamc/spamd today.
You made some interesting points. I will monitor what top/ps will have to say in the coming days. And yes, I am using mod_fcgi and notice that php is actually using much ram as well…
Where can I set the limit to the fcgi child processes?
child processes is in the server template-apache website.
for existing domains you’ll need to edit the /home/*/fcgi-bin/fcgi.php
this could be done with a tool like sed
doesnt look good, does it.
i killed those processes now, upgraded the ruby packages to the ones from etch-backports, and updated all gem packages, and redmine of course.
im not starting that mongrel thing yet, since i will try to install redmine through virtualmin in the hosted domain requiring it.
Well, my machine is an Opteron 1212HE with 2 GB of ram, swap partition is 2 GB as well.
Right now the situation looks like this:
Yesterday afternoon I had a look at top and it said that about 900 megs of swap were in use. Then, in a couple of hours, that climbed up to the full swap partition and things screwed up.
Any ideas for how I can trace the problem down to the app causing the big mem usage?