My primary point was, I find that when you try to cram everything on a single “small” VPS, you are not giving everything enough resources to run “properly”. I see people placing full stacks on 1GB or 2GB systems, then they load the same system with a bunch of busy (or unoptimized) WordPress sites and start to see issues with overall performance. A quick look at “htop” shows they’re spiking their resources regularly and they wonder why services start to shutdown…
*** I’ve also noticed lately MySQL tends to chew through a ton of CPU/Memory when not setup correctly or you start loading lots of database heavy scripts. ***
Happened again on another server, this one has 16 GB RAM. Found good instructions how to monitor using monit:
apt install monit
vi /etc/monit/conf-available/usermin
Add this contents to the new file usermin you are editing:
check host usermin with address 127.0.0.1
start program = "/bin/systemctl start usermin"
stop program = "/bin/systemctl stop usermin"
if failed port 20000 then restart
if 5 restarts within 5 cycles then timeout
[SAST Dec 30 08:13:53] error : 'usermin' failed protocol test [DEFAULT] at [127.0.0.1]:20000 [TCP/IP] -- Connection refused
[SAST Dec 30 08:13:53] info : 'usermin' trying to restart
[SAST Dec 30 08:13:53] info : 'usermin' stop: '/bin/systemctl stop usermin'
[SAST Dec 30 08:13:54] info : 'usermin' start: '/bin/systemctl start usermin'
[SAST Dec 30 08:15:56] info : 'usermin' connection succeeded to [127.0.0.1]:20000 [TCP/IP]
I had the same problem, and it did turn out to be out of memory issues. Until I could deal with that, I used Webmin’s System and Server Status to check and restart the usermin service.
Create a new monitor.
Commands to run > If monitor goes down, run command:
Y’all need to look at the Usermin miniserv.error log and the kernel log (for OOM killer messages) to find out why it’s exiting. Usermin does not crash. So, something is killing it.
Restarting it is just masking whatever problem your system has.
Not seeing anything though. Are we 100% sure Usermin OOMs would be logged and to which file?
I can reliably reproduce this issue now and would love to get it fixed. Clients rely on Webmail at critical times and even though monit is helping this is a delay of up to a minute where the client starts loosing trust.