Usermin randomly stops working across all Ubuntu 20.04 servers

@vander.host,

This is invalid information. You should NOT be updating Webmin from the Webmin repo when using Virtualmin. The versions you indicated as being outdated are also NOT outdated in terms of the versions made available to Virtualmin repo.

Virtualmin team does extra testing to ensure Webmin/Usermin work properly with the current version of Virtualmin.

1 Like

You are up to date with what is in the Virtualmin repos (intentionally). Where are you seeing “outdated”?

Do not add other repos, please, including the Webmin repos, for Virtualmin updates, unless explicitly told to do so by one of us (Staff).

That’s not true.

Services stopping “randomly” is almost always the OOM killer kicking in. Usermin doesn’t crash…so, something is stopping it.

How much free memory do you have?

That is because of me and my post , but i was on wrong leg because topic about usermin.

Then after it seems a virtualmin install did write not to have separate updates for usermin / webmin becuase it can break things.

So is misunderstanding sorry for that one. :wink:

But i also extra mentioned if bug / known problem he can find it probably somewhere in the updates with changelogs reading or issue’s.

Thanks guys for the incredible amount of tips and advice. Hopefully I can post enough information here to solve this problem.

What’s interesting this morning it happened again, and when it happens I notice that the service appears to be running:

# service usermin status
● usermin.service - LSB: web-based account administration interface for Unix systems
     Loaded: loaded (/etc/init.d/usermin; generated)
     Active: active (exited) since Thu 2021-11-18 08:56:46 UTC; 1 months 10 days ago
       Docs: man:systemd-sysv-generator(8)
      Tasks: 0 (limit: 2278)
     Memory: 0B
     CGroup: /system.slice/usermin.service

Warning: journal has been rotated since unit was started, output may be incomplete.

However the port closed the moment I tried accessing it:

root@buspage:~# telnet localhost 20000
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused

Then when I restart everything is fine again:

service usermin stop
service usermin start
telnet localhost 20000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]

Services stopping “randomly” is almost always the OOM killer kicking in

I don’t think it’s memory for but reference I paste the memory consumption here. Also I have many different servers and all of them have quite a bit of RAM, this one has the lowest though:

Real memory 792.92 MiB used / 980.54 MiB cached / 1.94 GiB total
Virtual memory 1.15 GiB used / 3.99 GiB total

Also I have to point out the service doesn’t stop “randomly”. It stops the moment any user tries to access port 20000. Then it works for a long time, but then stops again when a user tries to access it.

@calport thanks for the monitoring advice. I use PRTG extensively and Spatie’s network monitoring tools which has a Linux service checker. The problem of course as just mentioned is the service appears to just keep on running, but checking port 20000 always fails and then PRTG sends a push notification.

@vander.host,

Are you running a full stack on that system?

  • Web Server
  • Mail Server
  • DNS Server
  • ClamAV (antivirus)
  • SpamAssassin (antispam)
  • MySQL
  • etc

Yes it’s a stock / default Virtualmin install without any customizations whatsoever. All my installations are the same, stock installs without customizations. Around 6 of them, Ubuntu 20.04. All of them has the same issue. If I can figure out a watchdog or monit test that checks for port 20000 existence I can implement a temporary workaround.

@vander.host,

I have found that 2GB RAM is pushing things a bit, if you are running a full stack. Email software generally eats up a lot of resources on their own.

Most of our systems have a minimum of 4GB and even then I rarely put a full stack on the same machine in order to optimize and maximize resources (though I don’t expect everyone to follow suite on this front as it does require some extra work to maintain).

Mostly ClamAV. Everything else is quite small. SpamAssassin is the next biggest part of the mail stack, but it’s minuscule compared to clamd (which is over 1GB, by itself, and continues to grow faster every year). If ever you suspect memory is an issue, disable virus scanning and shut down clamd, and see how things look after. I’ve been tempted to stop installing ClamAV by default, as it’s just unrealistic to run it on most of the servers people are running Virtualmin on, and it is not intuitive that such a tiny part of the work the system does requires so much of the resources (even though the setup wizard is very clear about ClamAV being very large, I don’t think people always read it).

2 Likes

@Joe,

My primary point was, I find that when you try to cram everything on a single “small” VPS, you are not giving everything enough resources to run “properly”. I see people placing full stacks on 1GB or 2GB systems, then they load the same system with a bunch of busy (or unoptimized) WordPress sites and start to see issues with overall performance. A quick look at “htop” shows they’re spiking their resources regularly and they wonder why services start to shutdown… :slight_smile:

*** I’ve also noticed lately MySQL tends to chew through a ton of CPU/Memory when not setup correctly or you start loading lots of database heavy scripts. ***

Happened again on another server, this one has 16 GB RAM. Found good instructions how to monitor using monit:

apt install monit
vi /etc/monit/conf-available/usermin

Add this contents to the new file usermin you are editing:

check host usermin with address 127.0.0.1
start program = "/bin/systemctl start usermin"
stop program = "/bin/systemctl stop usermin"
if failed port 20000 then restart
if 5 restarts within 5 cycles then timeout

Link the file:

ln -s /etc/monit/conf-available/usermin /etc/monit/conf-enabled/

Check the syntax and reload monit

monit -t
systemctl reload monit

Typical log event:

[SAST Dec 30 08:13:53] error : 'usermin' failed protocol test [DEFAULT] at [127.0.0.1]:20000 [TCP/IP] -- Connection refused
[SAST Dec 30 08:13:53] info : 'usermin' trying to restart
[SAST Dec 30 08:13:53] info : 'usermin' stop: '/bin/systemctl stop usermin'
[SAST Dec 30 08:13:54] info : 'usermin' start: '/bin/systemctl start usermin'
[SAST Dec 30 08:15:56] info : 'usermin' connection succeeded to [127.0.0.1]:20000 [TCP/IP]

I had the same problem, and it did turn out to be out of memory issues. Until I could deal with that, I used Webmin’s System and Server Status to check and restart the usermin service.

Create a new monitor.

Commands to run > If monitor goes down, run command:

systemctl restart usermin

Monitored service options > Command to run:

nc -z -v localhost 20000

Monitored service option > Exit status check:

Fail monitor if command fails

2 Likes

yes, i have same problem.

sorry but i didn’t had time to check…
for now i solve with System and Server Status, like keenmouse suggests.

i have this probem on debian 10 / almalinux 8

thank you

Y’all need to look at the Usermin miniserv.error log and the kernel log (for OOM killer messages) to find out why it’s exiting. Usermin does not crash. So, something is killing it.

Restarting it is just masking whatever problem your system has.

@Joe I can now semi reliably make it stop working.

kernel log (for OOM killer messages)

Any cluets? I tried this on Ubuntu:

cat /var/log/syslog | grep oom

I also carefully looked at Syslog but I don’t see anything to do with memory or Usermin.

So I need some help how to detect out of memory events on a Ubuntu Linux to see if this is causing Usermin to stop working.

Just to be clear about how I make it “crash”:

  • I wait a few days.
  • I try to access Usermin
  • It stops immediately on first accexss
  • I start it again. It works fine, for a while

Well I found this:

grep oom /var/log/*
grep total_vm /var/log/*

Not seeing anything though. Are we 100% sure Usermin OOMs would be logged and to which file?

I can reliably reproduce this issue now and would love to get it fixed. Clients rely on Webmail at critical times and even though monit is helping this is a delay of up to a minute where the client starts loosing trust.

i’m not sure is related, but i’m using csf firewall on my virtualmin servers.

and one week ago i try to change this

nano /etc/csf/csf.pignore
cmd:/usr/bin/perl /usr/libexec/usermin/miniserv.pl /etc/usermin/miniserv.conf

restart csf and usermin

for now usermin works fine (even debian and almalinux)
but may is an update to solve issue…

thank you

@vander.host what is the ram and cpu on that server (no swap please - just real ram value) ?

@unborn RAM + CPU

16 GB RAM
7 vCPUs
Note: 300+ domains with lots of mailboxes, probably > 1000