This is invalid information. You should NOT be updating Webmin from the Webmin repo when using Virtualmin. The versions you indicated as being outdated are also NOT outdated in terms of the versions made available to Virtualmin repo.
Virtualmin team does extra testing to ensure Webmin/Usermin work properly with the current version of Virtualmin.
Thanks guys for the incredible amount of tips and advice. Hopefully I can post enough information here to solve this problem.
What’s interesting this morning it happened again, and when it happens I notice that the service appears to be running:
# service usermin status
● usermin.service - LSB: web-based account administration interface for Unix systems
Loaded: loaded (/etc/init.d/usermin; generated)
Active: active (exited) since Thu 2021-11-18 08:56:46 UTC; 1 months 10 days ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 2278)
Memory: 0B
CGroup: /system.slice/usermin.service
Warning: journal has been rotated since unit was started, output may be incomplete.
However the port closed the moment I tried accessing it:
root@buspage:~# telnet localhost 20000
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
Then when I restart everything is fine again:
service usermin stop
service usermin start
telnet localhost 20000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
Services stopping “randomly” is almost always the OOM killer kicking in
I don’t think it’s memory for but reference I paste the memory consumption here. Also I have many different servers and all of them have quite a bit of RAM, this one has the lowest though:
Real memory 792.92 MiB used / 980.54 MiB cached / 1.94 GiB total Virtual memory 1.15 GiB used / 3.99 GiB total
Also I have to point out the service doesn’t stop “randomly”. It stops the moment any user tries to access port 20000. Then it works for a long time, but then stops again when a user tries to access it.
@calport thanks for the monitoring advice. I use PRTG extensively and Spatie’s network monitoring tools which has a Linux service checker. The problem of course as just mentioned is the service appears to just keep on running, but checking port 20000 always fails and then PRTG sends a push notification.
Yes it’s a stock / default Virtualmin install without any customizations whatsoever. All my installations are the same, stock installs without customizations. Around 6 of them, Ubuntu 20.04. All of them has the same issue. If I can figure out a watchdog or monit test that checks for port 20000 existence I can implement a temporary workaround.
I have found that 2GB RAM is pushing things a bit, if you are running a full stack. Email software generally eats up a lot of resources on their own.
Most of our systems have a minimum of 4GB and even then I rarely put a full stack on the same machine in order to optimize and maximize resources (though I don’t expect everyone to follow suite on this front as it does require some extra work to maintain).
Mostly ClamAV. Everything else is quite small. SpamAssassin is the next biggest part of the mail stack, but it’s minuscule compared to clamd (which is over 1GB, by itself, and continues to grow faster every year). If ever you suspect memory is an issue, disable virus scanning and shut down clamd, and see how things look after. I’ve been tempted to stop installing ClamAV by default, as it’s just unrealistic to run it on most of the servers people are running Virtualmin on, and it is not intuitive that such a tiny part of the work the system does requires so much of the resources (even though the setup wizard is very clear about ClamAV being very large, I don’t think people always read it).
My primary point was, I find that when you try to cram everything on a single “small” VPS, you are not giving everything enough resources to run “properly”. I see people placing full stacks on 1GB or 2GB systems, then they load the same system with a bunch of busy (or unoptimized) WordPress sites and start to see issues with overall performance. A quick look at “htop” shows they’re spiking their resources regularly and they wonder why services start to shutdown…
*** I’ve also noticed lately MySQL tends to chew through a ton of CPU/Memory when not setup correctly or you start loading lots of database heavy scripts. ***
Happened again on another server, this one has 16 GB RAM. Found good instructions how to monitor using monit:
apt install monit
vi /etc/monit/conf-available/usermin
Add this contents to the new file usermin you are editing:
check host usermin with address 127.0.0.1
start program = "/bin/systemctl start usermin"
stop program = "/bin/systemctl stop usermin"
if failed port 20000 then restart
if 5 restarts within 5 cycles then timeout
[SAST Dec 30 08:13:53] error : 'usermin' failed protocol test [DEFAULT] at [127.0.0.1]:20000 [TCP/IP] -- Connection refused
[SAST Dec 30 08:13:53] info : 'usermin' trying to restart
[SAST Dec 30 08:13:53] info : 'usermin' stop: '/bin/systemctl stop usermin'
[SAST Dec 30 08:13:54] info : 'usermin' start: '/bin/systemctl start usermin'
[SAST Dec 30 08:15:56] info : 'usermin' connection succeeded to [127.0.0.1]:20000 [TCP/IP]
I had the same problem, and it did turn out to be out of memory issues. Until I could deal with that, I used Webmin’s System and Server Status to check and restart the usermin service.
Create a new monitor.
Commands to run > If monitor goes down, run command:
Y’all need to look at the Usermin miniserv.error log and the kernel log (for OOM killer messages) to find out why it’s exiting. Usermin does not crash. So, something is killing it.
Restarting it is just masking whatever problem your system has.
Not seeing anything though. Are we 100% sure Usermin OOMs would be logged and to which file?
I can reliably reproduce this issue now and would love to get it fixed. Clients rely on Webmail at critical times and even though monit is helping this is a delay of up to a minute where the client starts loosing trust.