collectinfo.pl using 100% memory after update

Hi Friendly Virtualminders,

After years of near perfect uptime with Virtualmin I have a little hiccup. I just did a couple things at once : deleted a few large virtual domains, and updated to the latest Virtualmin via apt-get (Ubuntu 14.x LTS).

Unfortunately when the webmin service starts it kicks off a collectinfo.pl job that uses 100% CPU and slowly (takes about 1-2 min) eats 100% memory. then the kernel OOM starts munching things like apache and mysql. Quick work-around is to disable webmin (service webmin stop).

Is there a verbose / trace / debug flag I can give the script to see what it’s hanging up on ? I didn’t see anything obvious in the script itself.

Thanks,
-m

any ideas? Unfortunately I have had to disable webmin in the mean time and would like to restore it.

Thanks,
-m

I have one more clue – the OOM killer in Linux snipped seems to be killing ‘lookup-domain.pl’.
[406668.960658] /usr/share/webm invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[407449.740591] lookup-domain.p invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[407564.540507] lookup-domain.p invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

ideas?

Getting desperate here – my letsencrypt certificates are now not renewing because they are installed through virtualmin.

Thanks,
-m

OK, I found the problem, but could use some help root-causing it for others.
First I saw this post : https://www.virtualmin.com/node/40877 from @andreychek which implied a possible problem in a file in /etc/webmin/virtual-server/domains/*
I didn’t see any obvious ones, but in a process of elimination I found the bad file that was causing problems.
It’s a subdomain that was ported over from cpanel to virtualmin many many years ago. It was disabled, so that is probably what was causing problems.
I made a little script to compare a good file and the bad one and here are the differences in terms of keys in the domains/* file.

Bad missing backup_encpass Bad missing bw_notify Bad missing bw_usage_mail Bad missing bw_usage_only_mail Bad missing bw_usage_only_web Bad missing bw_usage_web Bad missing cgi_bin_correct Bad missing db Bad missing ftp Bad missing limit_virtualmin-awstats Bad missing limit_virtualmin-dav Bad missing logrotate Bad missing mysql Bad missing mysql_enc_pass Bad missing mysql_user Bad missing no_mysql_db Bad missing postgres Bad missing spam Bad missing stats_pass Bad missing virtalready Bad missing virtualmin-awstats Bad missing virtualmin-dav Bad missing virtualmin-mailman Bad missing virus Bad missing webalizer Bad missing webmin Good missing backup_parent_dom Good missing backup_subdom_dom Good missing disabled Good missing disabled_oldpass Good missing disabled_reason Good missing disabled_time Good missing disabled_why Good missing dns_ip Good missing dns_submode Good missing reseller Good missing ssl_cert Good missing ssl_key Good missing subdom Good missing subprefix

Any ideas?
Thanks,
-m

Howdy,

Do you just have that one bad file? That is, if you temporarily remove it, do you no longer have problems with collectinfo running?

However, you could always go into System Settings -> Virtualmin Config -> Status Collection, and there you can increase the time in between status collection runs, or disable it altogether.

-Eric