Thanks guys for the incredible amount of tips and advice. Hopefully I can post enough information here to solve this problem.
What’s interesting this morning it happened again, and when it happens I notice that the service appears to be running:
# service usermin status
● usermin.service - LSB: web-based account administration interface for Unix systems
Loaded: loaded (/etc/init.d/usermin; generated)
Active: active (exited) since Thu 2021-11-18 08:56:46 UTC; 1 months 10 days ago
Tasks: 0 (limit: 2278)
Warning: journal has been rotated since unit was started, output may be incomplete.
However the port closed the moment I tried accessing it:
root@buspage:~# telnet localhost 20000
telnet: Unable to connect to remote host: Connection refused
Then when I restart everything is fine again:
service usermin stop
service usermin start
telnet localhost 20000
Connected to localhost.
Escape character is '^]'.
Services stopping “randomly” is almost always the OOM killer kicking in
I don’t think it’s memory for but reference I paste the memory consumption here. Also I have many different servers and all of them have quite a bit of RAM, this one has the lowest though:
Real memory 792.92 MiB used / 980.54 MiB cached / 1.94 GiB total
Virtual memory 1.15 GiB used / 3.99 GiB total
Also I have to point out the service doesn’t stop “randomly”. It stops the moment any user tries to access port 20000. Then it works for a long time, but then stops again when a user tries to access it.
@calport thanks for the monitoring advice. I use PRTG extensively and Spatie’s network monitoring tools which has a Linux service checker. The problem of course as just mentioned is the service appears to just keep on running, but checking port 20000 always fails and then PRTG sends a push notification.