MarisDB server is down

Joe · July 8, 2024, 4:27pm

As I said, it is not the start that’s the problem, it’s the killing.

Restarting Mariadb in the Webmin module is not killing it. It is sending it a signal to stop, allowing it to stop itself, and then starting it back up.

The OOM killer literally rips it out of the system, immediately. It has to. The system is out of memory, which is catastrophic. Something has to go, and it randomly picks something big enough to free up the resources it needs to allocate.

Joe · July 8, 2024, 4:29pm

A database being killed is dangerous.

Restarting it normally is not.

But, if you start it, only to have it killed by the OOM killer again, you have doubled your chance of data loss or corruption. If it happens again, you’ve tripled your risk. Again, 4x. etc. Every time you ignore the problem and just start the database up again, you ask for another spin of the data loss/corruption wheel.

jimr1 · July 8, 2024, 4:31pm

So if system monitor sees the service down, if configured, it will restart it

so it makes “if monitor goes down run command” nonsensical ?

calport · July 8, 2024, 4:31pm

Yes, in a worse case scenario that can happen and since it is a worse case scenario there will be an electrical short circuit which will cause a fire and your backups will go up in flames.

But in the real world, when a database server crashes, it causes the release of memory and the freeing up of the CPU. So when it is restarted automatically, it is highly likely that the situation which caused the database server to crash has resolved itself.

calport · July 8, 2024, 4:33pm

Actually Joe, it would quadruple the risk in that iteration, not triple it.

But your point is taken.

jimr1 · July 8, 2024, 4:36pm

so remove the option from the module to ensure nobody hits this issue

calport · July 8, 2024, 4:38pm

Joe, pls delete the Webmin project. We will edit the config files with a hex editor.

jimr1 · July 8, 2024, 4:39pm

is the relevant ? looks like your in the wrong thread

Joe · July 8, 2024, 4:40pm

Why do you assume OOM killer is the only case where a monitor would go down? And, why would starting the service be the only command one might run in the case a monitor goes down? You might also alert someone to the problem with that command, or perform some other mitigating action.

Also note there are many kinds of monitor, and “down” can mean all sorts of things. It’s a general purpose tool. It’s not for Mariadb, it’s for anything. You could, for example, create monitors for available memory and disk space and warn if those are getting low, maybe send yourself a text or an email, so you can do something about it.

Of course, memory can get used up very quickly if something is going wrong, so this may not be sufficient to keep a system up and running, and maybe you weigh your options and decide to restart Mariadb anyway…but, if you already know the problem is memory, as in this case, you should solve that rather than making a monitor to restart the service after killing.

Joe · July 8, 2024, 4:42pm

This must be some of that new math.

1 failure=1x
2 failure=2x
3 failure=4x, somehow? I don’t follow.

jimr1 · July 8, 2024, 4:44pm

So therefore test if OOM has caused the issue then do something else, I would guess most people would just want the service back up & running, that said, as you quite rightly say, the command may do something else

Joe · July 8, 2024, 4:48pm

Yes, if Mariadb is dying, the immediate goal may be to get it back online, but the priority should be to figure out why it’s dying and solve that root cause. Just starting it back up and ignoring the root cause is asking for data loss.

shoulders · July 8, 2024, 4:54pm

Can I recommend take a backup and store it somewhere safe.

Make sure you have automatic database backups and you want more than just the last night just incase that one is broken.

jimr1 · July 8, 2024, 4:55pm

I would guess this is about the only issue you would get that would cause the server to shut down if it runs out of file handles maria does not shut down it just drops the handles and throws an error when your code tries to connect. that is not a webmin problem, so what else drops a server ?

jimr1 · July 8, 2024, 4:58pm

if you do that, there is a possibility the data will be out of date. This is more of a thing of restoring the data, however you could do this on the fly rather than an old backup

vending_makina · July 26, 2024, 5:56am

Back to the main topic:
I only have experience with MariaDB together with WordPress, and indeed some times the OOM killer comes to poor Maria.

From my experience I know this mostly happens due to an online attack on the webserver/WordPress site. It is usually caused by a flood of HTTP requests:

trying to guess the wp-admin password (vial XMRPC), or
a generic attempt at finding a security vulnerability by trying 100s of URLs per minute to see which one works.

PHP FPM, as it should, spawns new threads to handle the new load, and eventually this could cause a high memory load that sacrifices MariaDB (or something else).

One solution is to limit the number of children FPM spawns, but I currently have to configure this according to the system’s capacity. I’m yet to find out how to tell FPM to not spawn new children if RAM is more than x% full.

Another solution is a web application firewall (WAF). Incidentally, Cloudflare offers a free plan that helps a bit with this (not well enough, in my opinion).

jimr1 · July 26, 2024, 8:20am

I switched to mysql community server 8.0 and to date there has been no OOM errors with wordpress sites, I can only conclude that wordpress & maria do not play nicely together for some reason but to be fair this really is ‘out of scope’ for webmin/virtualmin as they only manage packages & give you an interface to edit settings etc. I would guess somewhere there must be some resource that identifies the problem, but where IDK

Stegan · July 26, 2024, 9:15am

Most of those kiddies can be jailed quite easily with:
Blocking WordPress scanners with fail2ban

I have to add that I am most definitely NOT a WordPress user so this might not work so well if you host WP

verne · July 26, 2024, 1:06pm

while it is simply a crude security by obscurity attempt, on our wordpress sites we have been able to cut down a little bit on hackers by changing the default /wp-admin/ URL to something different thru using any number of plugins offering that tweak.

Of course then 1/2 the time our clients forget what it’s set to

popmay · July 26, 2024, 1:47pm

Here is the regex for Fail2Ban jail that Lila made. I also use recidive jail to set long bans on all ports for repeat offenders.

IliaVirtualmin Staff
Apr 30

Go to Networking ⇾ Fail2Ban Intrusion Detector: Log Filters page;
Click Add a new log filter button;
Fill the following fields:
3.1. Filter name: wordpress;
3.2. Regular expressions to match:

<HOST>.*POST.*(wp-login\.php|xmlrpc\.php|account\/signin).* 200

3.3. Click Create button;
Go to Networking ⇾ Fail2Ban Intrusion Detector: Filter Action Jails page;
Click Add a new jail button;
Fill the following fields:
6.1. Jail name : wordpress-domain-com;
6.2. Filter to search log for: wordpress;
6.3. Currently enabled? set to Yes;
6.4. Log file paths:

/var/log/virtualmin/domain.com_access_log

6.5. Click Create button;
Enjoy

Note: A backend may need to be manually defined as described in this comment.