As I said, it is not the start that’s the problem, it’s the killing.
Restarting Mariadb in the Webmin module is not killing it. It is sending it a signal to stop, allowing it to stop itself, and then starting it back up.
The OOM killer literally rips it out of the system, immediately. It has to. The system is out of memory, which is catastrophic. Something has to go, and it randomly picks something big enough to free up the resources it needs to allocate.
But, if you start it, only to have it killed by the OOM killer again, you have doubled your chance of data loss or corruption. If it happens again, you’ve tripled your risk. Again, 4x. etc. Every time you ignore the problem and just start the database up again, you ask for another spin of the data loss/corruption wheel.
Yes, in a worse case scenario that can happen and since it is a worse case scenario there will be an electrical short circuit which will cause a fire and your backups will go up in flames.
But in the real world, when a database server crashes, it causes the release of memory and the freeing up of the CPU. So when it is restarted automatically, it is highly likely that the situation which caused the database server to crash has resolved itself.
Why do you assume OOM killer is the only case where a monitor would go down? And, why would starting the service be the only command one might run in the case a monitor goes down? You might also alert someone to the problem with that command, or perform some other mitigating action.
Also note there are many kinds of monitor, and “down” can mean all sorts of things. It’s a general purpose tool. It’s not for Mariadb, it’s for anything. You could, for example, create monitors for available memory and disk space and warn if those are getting low, maybe send yourself a text or an email, so you can do something about it.
Of course, memory can get used up very quickly if something is going wrong, so this may not be sufficient to keep a system up and running, and maybe you weigh your options and decide to restart Mariadb anyway…but, if you already know the problem is memory, as in this case, you should solve that rather than making a monitor to restart the service after killing.
So therefore test if OOM has caused the issue then do something else, I would guess most people would just want the service back up & running, that said, as you quite rightly say, the command may do something else
Yes, if Mariadb is dying, the immediate goal may be to get it back online, but the priority should be to figure out why it’s dying and solve that root cause. Just starting it back up and ignoring the root cause is asking for data loss.
I would guess this is about the only issue you would get that would cause the server to shut down if it runs out of file handles maria does not shut down it just drops the handles and throws an error when your code tries to connect. that is not a webmin problem, so what else drops a server ?
if you do that, there is a possibility the data will be out of date. This is more of a thing of restoring the data, however you could do this on the fly rather than an old backup
Back to the main topic:
I only have experience with MariaDB together with WordPress, and indeed some times the OOM killer comes to poor Maria.
From my experience I know this mostly happens due to an online attack on the webserver/WordPress site. It is usually caused by a flood of HTTP requests:
trying to guess the wp-admin password (vial XMRPC), or
a generic attempt at finding a security vulnerability by trying 100s of URLs per minute to see which one works.
PHP FPM, as it should, spawns new threads to handle the new load, and eventually this could cause a high memory load that sacrifices MariaDB (or something else).
One solution is to limit the number of children FPM spawns, but I currently have to configure this according to the system’s capacity. I’m yet to find out how to tell FPM to not spawn new children if RAM is more than x% full.
Another solution is a web application firewall (WAF). Incidentally, Cloudflare offers a free plan that helps a bit with this (not well enough, in my opinion).
I switched to mysql community server 8.0 and to date there has been no OOM errors with wordpress sites, I can only conclude that wordpress & maria do not play nicely together for some reason but to be fair this really is ‘out of scope’ for webmin/virtualmin as they only manage packages & give you an interface to edit settings etc. I would guess somewhere there must be some resource that identifies the problem, but where IDK
while it is simply a crude security by obscurity attempt, on our wordpress sites we have been able to cut down a little bit on hackers by changing the default /wp-admin/ URL to something different thru using any number of plugins offering that tweak.
Of course then 1/2 the time our clients forget what it’s set to
Go to Networking ⇾ Fail2Ban Intrusion Detector: Log Filters page;
Click Add a new log filter button;
Fill the following fields:
3.1. Filter name: wordpress;
3.2. Regular expressions to match:
<HOST>.*POST.*(wp-login\.php|xmlrpc\.php|account\/signin).* 200
3.3. Click Create button;
Go to Networking ⇾ Fail2Ban Intrusion Detector: Filter Action Jails page;
Click Add a new jail button;
Fill the following fields:
6.1. Jail name : wordpress-domain-com;
6.2. Filter to search log for: wordpress;
6.3. Currently enabled? set to Yes;
6.4. Log file paths:
/var/log/virtualmin/domain.com_access_log
6.5. Click Create button;
Enjoy
Note: A backend may need to be manually defined as described in this comment.