System and server status: reset counter option?

SYSTEM INFORMATION
OS type and version Ubuntu Linux 22.04.5
Webmin version 2.621

I use “System and server status” to reboot the server automatically in cases where nothing else seems to help. “Failures before reporting” is set to 10, the scheduled monitoring is set to every 5 minutes (I have a lot of status checks and this is generally a good interval).

So, after 10 failures = 50 minutes the server is being rebooted. But if the problem persists after the reboot, the system continues to reboot every 5 minutes, because the counter is not being resetted. It even happened that the server got the new reboot command before the old reboot was completed, which results in a big mess.

Is it possible to add an option to reset the counter if the command has been executed? Or is my approach fundamentally wrong? My intention is to keep the system alive if no human being is available to intervene.

If restarting a service doesn’t bring it back, I can’t imagine a reboot is going to do anything good.

Generally speaking, things are either working or they’re not, and if they’re not, no amount of restarting is going to fix it.

I believe the counter resets when the service that triggered it returns. So, rebooting, which almost certainly doesn’t fix the problem, also doesn’t reset the counter. You could probably make a script that gets called that resets the state before rebooting…but, given the service is almost certainly still going to be in a down state after the reboot, it’s going to reboot every 50 minutes forever. Seems worse than the problem it’s trying to solve.

Usually, I send notifications when a service is down, rather than trying to solve it by restarting everything in more and more dramatic fashion, which causes service outages itself.

1 Like