MarisDB server is down

As I said, it is not the start that’s the problem, it’s the killing.

Restarting Mariadb in the Webmin module is not killing it. It is sending it a signal to stop, allowing it to stop itself, and then starting it back up.

The OOM killer literally rips it out of the system, immediately. It has to. The system is out of memory, which is catastrophic. Something has to go, and it randomly picks something big enough to free up the resources it needs to allocate.

A database being killed is dangerous.

Restarting it normally is not.

But, if you start it, only to have it killed by the OOM killer again, you have doubled your chance of data loss or corruption. If it happens again, you’ve tripled your risk. Again, 4x. etc. Every time you ignore the problem and just start the database up again, you ask for another spin of the data loss/corruption wheel.

So if system monitor sees the service down, if configured, it will restart it


so it makes “if monitor goes down run command” nonsensical ?

Yes, in a worse case scenario that can happen and since it is a worse case scenario there will be an electrical short circuit which will cause a fire and your backups will go up in flames.

But in the real world, when a database server crashes, it causes the release of memory and the freeing up of the CPU. So when it is restarted automatically, it is highly likely that the situation which caused the database server to crash has resolved itself.

Actually Joe, it would quadruple the risk in that iteration, not triple it.

But your point is taken.

so remove the option from the module to ensure nobody hits this issue

Joe, pls delete the Webmin project. We will edit the config files with a hex editor.

is the relevant ? looks like your in the wrong thread

Why do you assume OOM killer is the only case where a monitor would go down? And, why would starting the service be the only command one might run in the case a monitor goes down? You might also alert someone to the problem with that command, or perform some other mitigating action.

Also note there are many kinds of monitor, and “down” can mean all sorts of things. It’s a general purpose tool. It’s not for Mariadb, it’s for anything. You could, for example, create monitors for available memory and disk space and warn if those are getting low, maybe send yourself a text or an email, so you can do something about it.

Of course, memory can get used up very quickly if something is going wrong, so this may not be sufficient to keep a system up and running, and maybe you weigh your options and decide to restart Mariadb anyway…but, if you already know the problem is memory, as in this case, you should solve that rather than making a monitor to restart the service after killing.

1 Like

This must be some of that new math.

1 failure=1x
2 failure=2x
3 failure=4x, somehow? I don’t follow.

So therefore test if OOM has caused the issue then do something else, I would guess most people would just want the service back up & running, that said, as you quite rightly say, the command may do something else

Yes, if Mariadb is dying, the immediate goal may be to get it back online, but the priority should be to figure out why it’s dying and solve that root cause. Just starting it back up and ignoring the root cause is asking for data loss.

Can I recommend take a backup and store it somewhere safe.

Make sure you have automatic database backups and you want more than just the last night just incase that one is broken.

I would guess this is about the only issue you would get that would cause the server to shut down if it runs out of file handles maria does not shut down it just drops the handles and throws an error when your code tries to connect. that is not a webmin problem, so what else drops a server ?

if you do that, there is a possibility the data will be out of date. This is more of a thing of restoring the data, however you could do this on the fly rather than an old backup