Problem with too many virtual sub-server's for single parent

nostrich · May 2, 2013, 6:19am

Hi,

We have about 2000 virtual servers on our physical Virtualmin server, each of which is a sub server of a single parent virtual server. Each time I try to create a new sub-server of the parent it stalls at the “Updating Webmin user” stage and stays there for 10 mins. During this time I can see (using htop from cli) that the domain_setup.cgi process is using 100% of one, sometimes both of the CPU cores. It does eventually complete successfully and everything works as expected.

If I create a new parent server or a sub server using a different parent everything works quickly in the expected amount of time (20-30 secs). I get the same result if trying to use the API’s create-domain.pl script.

My hunch is that it has something to do with the amount of sub-servers that are using the same parent server’s account. Ideally I would really like to resolve this problem and continue using the same parent server though worse case I can create a new dedicated parent and start adding them to the new one.

Any assistance/brain storming ideas would be much appreciated.

Thanks

Eric · May 2, 2013, 9:07pm

Howdy,

I explained your problem to Jamie… he asked if it might be possible for him to log into your system, and troubleshoot the issue to determine what the bottleneck is.

Is that something that would be possible?

If so, you can email root login details to eric@virtualmin.com.

Also, do you have an example of a Sub-Server that you’re trying to add? If you could include the details of some new domains you’d like added, that would help in the troubleshooting process.

Thanks!

-Eric

nostrich · May 2, 2013, 11:51pm

Hi Eric,

Thanks for your quick response.

This is a production server that is very important to us. Unfortunately I am unable to give out root login details to unauthorised parties. Sorry, company policy.

I am however fairly technically competent so I can carry out any troubleshooting steps that might provide more information about the problem…?

Dan

Eric · May 3, 2013, 3:15am

Howdy,

Well, he was hoping to do some code profiling on your system, which is a more involved process than we can really describe.

Without a close look, there’s not likely an easy fix to the problem you’re seeing – it’s likely a code problem, where something isn’t working as efficiently as it should.

What Jamie needs to do in order to fix it is to replicate a setup such as yours, cause the problem, and then determine what in that is running slowly. You have an above-average number of domains there, systems with that many don’t get too much testing.

I’ve explained the issue you’re seeing to him though, and we’ll see if there’s anything we can figure out.

-Eric

nostrich · May 6, 2013, 1:33am

I ended up just creating a new Parent Virtual Server and changed my script to add new Sub-Servers to that. Everything is working now as expected again except that the services occasionally crash due to too many files being open. This is due to the each virtual server having two log files open.

I am in the process of writing a script to change the log file output for each sub server to write to a single file for the parent.

Eric · May 6, 2013, 3:31am

Howdy,

This is due to the each virtual server having two log files open.

Are you currently using the writelogs program for handling logging? Or is Apache writing directly to the logfiles in /var/log/virtualmin?

It should be possible to prevent that files error you’re getting, I’m just trying to determine how best to tackle that.

-Eric

nostrich · May 8, 2013, 1:57am

Apache is currently writing directly to the logfiles in /var/virtualmin with each virtual server having its own log files.

Most days around the same time (8.20am) BIND, Telnet and Webmin/Virtualmin fall over. During this time Apache keeps running. It seems that is it a cron job which is causing this problem.

This is what I see in the /var/log/syslog file just prior to the crash,

May 8 08:17:01 CRON[7339]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 8 08:20:01 CRON[7377]: (root) CMD (/home/bp/bin/ps.sh)
May 8 08:20:01 CRON[7379]: (root) CMD (/etc/webmin/status/monitor.pl)
May 8 08:20:01 CRON[7380]: (www-data) CMD ([ -x /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r /var/log/apache2/access.log ] && /usr/lib/cgi-bin/awstats.pl -config=awstats -update >/dev/null)

I got an alert this morning at exactly 8.20am telling me that BIND, Telnet and Webmin/Virtualmin services were down. I restarted these services and everything has been stable since.

Eric · May 8, 2013, 3:29am

Which Linux distribution is it that you’re using there?

-Eric

nostrich · May 8, 2013, 4:36am

Ubuntu 10.04 LTS
Virtualmin 3.99 GPL

Eric · May 8, 2013, 1:07pm

Try editing this file:

/etc/cron.d/awstats

And in there, comment everything in it out.

You don’t need any of that for Virtualmin – Virtualmin has it’s own cron jobs it sets up for awstats. That should help cut down on open files.

-Eric