I am running the most recent version of Virtualmin and have a server hosting several websites. Earlier I sometimes had all the websites on the servers go offline due to all memory being in use, and thus it killed MariaDB which I had to restart and then everything worked again.
Recently, I have had something similar happen, all my websites goes offline, but as I check the control panel, all services are running properly, and the websites only get back up and running if I reboot the entire server.
I have checked for errors in DMESG, but there are no errors showing up at all (when I ran out of memory earlier which killed MariaDB I could always see it in DMESG).
Anyone have any suggestions and ideas for where I should start looking to find out what is causing the server and all the websites to go offline?
Do you mean that Fail2Ban might block all traffic to all websites on the server for some reason, and this resets as I reboot the server? Is that a realistic option? Wouldn’t the websites remain unreachable after rebooting the server if it is a Fail2Ban error?
Hello,
I have a 2GB swap file, 16GB memory. But once again, the server shows no sign of overload, nor do I see any error messages at least in the DMESG command.
Maybe some sort of overload on the server still, but don’t understand exactly why/where/what?! I see that there are continuous attempts at reaching the mail server, but I would think this would be seen in an error message such as memory running out in DMESG if this is what is causing this trouble?
Ps: I don’t host any mail servers myself, the only reason I keep postfix running is so that the websites I host on my servers is able to send information from contact forms on the actual websites to different email address.
as I wrote earlier, the problem is resolved by rebooting the server, in other words, it works. My question is, how can I find out what causes the websites to crash with rebooting being an option the solves it?
And again, all processes are running properly, and I do not see any error messages with DMESG when this occurs (it has happened almost weekly now in the last 3-4 weeks).
I don’t know about DMESG I just check log file on webmin/virtualmin. But best setup monitor so it send to a email when it goes down and you can get onto it straight away.
Hello,
I have several monitors running so I normally get the website up and running quite quickly. But it is still annoying, because if I am asleep when it happens or something else, it can take hours, so I would love to understand and find out what is actually causing the problem and looking at auth.log, kern.log, syslog, and some other files I still haven’t found out exactly what is happening.
Try to figure out if it concerns only you or if the sites are down for the rest of the world too. There are several ways to do it. Checking the Apache logs is a good first move: if they keep filling up, it is a sign that there is still trafic and that you might be the only one affected. There also are the different “down or just me” services on the web that might be useful. You could also try to connect yourself with a different IP (using a VPN or simply your phone). If you are the only one affected, this might very well be fail2ban making a false positive. You can unban yourself using Webmin’s module and don’t forget to check fail2ban’s logs: search for your own IP to understand what happened.
If your sites are down for everyone (and assuming that Virtualmin/Webmin is still accessible), there is most likely a service down or malfunctioning (even if it is not immediately obvious on the dashboard). Check their status in command line: systemctl status bind9, systemctl status apache2 and systemctl status mysql You can also check their logs and try to restart them one by one to see if it changes anything.
Thanks for your tips here. The sites are down for the entire world, not only me. I will try checking the status of the different services as described the next time it takes place!
I had a similar issue with the server going offline. I traced the reboot ( I used Linode which automatically rebooted after the crash) Web traffic at crash time and found about a 1000 different IPs attacking my opencart app within a few minutes. I’ve since implemented changes and it has stopped the problem. Check your web server files prior to the event. Good luck.
Just happened again and I tried to find out exactly which service caused the trouble and it is Apache2. So by restarting Apache, the websites became available once again. I should mention that Apache was running all the time (it didn’t crash and it showed as running in the Virtualmin panel), but the websites still became unavailable to me and everyone else online.
Now I am trying to find out exactly what has happened and what caused this to happen, but so far with no luck.
The warnings I see coming before the crash are coming all the time in the error log file, even after restarting the server, and I see approximately one such error per second. I am not sure what it is and whether this is causing the problem or influencing it, so if anyone can help, I would be grateful about that.
The graceful shutdown coming after the errors is me restarting the apache service.
Anywhere else I should look to find out what’s going on or what caused the error?
In auth.log I see there are lots of login attempts, but this seems to be quite constant… is this what might be causing the problem?
I’m not convinced. Too many @praguepraha.com (CloudFlare) why not being picked up by fail2ban. You said that you have no mail server running - so why even bother with postfix server?
you are suggesting an Apache problem - some conflict in the configuration perhaps.
Postfix is running because (based on my experience), if I disable it, the contact forms on the websites running on the server stop sending the emails as people fill in the forms.Once I enable Postfix, the emails are sent, and since that is one of the most important part of the business with contact forms, I cannot risk those emails not arriving. If I am wrong in this and it should work even with Postfix disabled, I would be really grateful to hear about that.
disable on a per domain basis, shouldn’t that allow forms to still work? and firewall incoming mail port maybe. But I can’t see how postfix would effect your websites.
You say apache is still running, and not crashing. whats the error when you try to connect to the website in the browser?
Is this on a VPS or on actual hardware? If so do a disk check as well.
So the program on the website is sending with an account on the server - so the server’s account is publicly exposed by that program? is that where all the “attacks” originate? I would be looking into the code of the contact form. Why does it even need an email exposed client side? A contact form presumably has only four fields:
the client’s email/ phone/ reply to;
the type of contact being made;
the contact message;
(hidden) an encrypted code to assure validity of the form.
All should be handled server-side as appropriate. no server user email exposed.