Postfix is shut off and can start up

| SYSTEM INFORMATION |
|------------------------------|-------------------------------|
| OS type and version | CentOS Linux 6.10 |
| Virtualmin version | 6.09 |

Hello, everyone!

My Postfix Server has shut down and it just hangs (the moving-left-to-right red bar) when attempting to start. Is this a common issue?

I have no idea how to troubleshoot this in Virtualmin. Let me know where to start and I will provide as much info as possible.

Thank you.

Kenneth

It is not a common issue.

What happens when you try to restart the postfix service on the command line (in a real ssh session, not the popup terminal, which is not yet a real interactive terminal)?

Greetings, Joe. Thank you for the fast reply.

It did certainly seem weird.
However, it appears fixed now. It’s running anyway.

The approach that I took was to initiate a restart via Virtualmin, as detailed above. I walked away from the terminal, hoping that the unattended server would eventually restart. My check just now says that this may have been the case.

Test emails that I sent last night (both before and after the restart) arrived in the recipient inbox about three or four hours later.

I just sent another test email, which has not yet arrived.

I have also (starting ten minutes ago) attempted to open the Postfix server panel in Virtualmin (edit: I mean Webmin) and it is also hanging in the same way the restart did last night (see above).

Maybe there is a hanging process that eventually times out?

Thoughts?

Kenneth

That tells me you’re low on memory, probably. There is no reason for most Webmin pages to take more than a couple seconds to load, unless the server is overloaded or out of memory and in swap hell.

Login to a shell, so the UI isn’t a factor in performance, and look at top to see the state of memory and CPU usage, etc. Also check the kernel log (dmesg) for OOM killer messages (it’ll say something like “out of memory” and “process reaped” or “process killed” or something, I dunno…).

centos 6 doesn’t help

Oh lordt. I didn’t even notice.

You can’t run CentOS 6 on a world-facing server. It’s not safe. And, we no longer support CentOS 6. It has been unsupported upstream for two years. It’s crazy to have anything running that on a public IP.

To be clear, the reasonable assumption here is that your system has been compromise and is overloaded sending spam and running bots. Maybe a crypto miner or two, as well. That’ll make everything else unresponsive.

Hi Joe and Stefan,

Sorry for the slow reply. Partly this delay was due to my catching the flu; partly it was due to the real reason for the neglect this server has seen. To wit:

This client hasn’t paid me in over two years. It’s partly understandable as their company was hit very hard by the pandemic, and they are truly struggling. I have transferred the server fee to him, so I’m not directly out-of-pocket—but I’m too nice of a guy to drop my care of the code completely. Charity case. What can I say?

I have a systems analysis email which sends me an update once a week (a software analysis, nothing to do with the server). Once I restarted the the Postfix server, these began coming in, but they appear to be grossly delayed. I received one this morning that was sent around August 1st, and one this afternoon that was sent around August 10th. I think this more than supports the suspicions you voiced in your last reply.

What steps do you recommend? Is this as simple as running a command-line upgrade on CentOS?

Note that the service provider is proudly vocal about not supporting spam; so I should be able to leverage that in my favour.

Any advice is appreciated. Thank you.

Not that I am aware of. That feature only appeared in 7 or 8, I believe.

But, if the system is compromised, you can’t trust it, even if you upgrade it. That’s closing the barn doors after the horses are already gone.

If the system has been rooted, the only way to be reasonably sure you’ve fixed it is to format and reinstall the OS (or a new OS in this case, but you need to plan some extra down time when going from 6 to 7 or 8, as your web apps will likely need attention…the longer they’ve been unmaintained, the more attention they’ll need). BIOS malware, which would make the entire machine suspect even with a fresh OS, is also possible, but pretty unlikely, I think.

You also need to plan to figure out how it was compromised. If the initial entry point was one of your web apps, you need to fix that, too.

I don’t have an easy answer for you, even if the system is not compromised, but given the circumstances of a system being unmaintained for two years and being on a public IP, I think it’s better than even odds it is compromised. Best case scenario in that case would be a user-level compromise, in which case you should be able to see which processes are running and figure out what they’re doing. That can be cleaned up. But, if they got root, you can’t reasonably clean it up (it’d take longer than formatting and reinstalling or starting over on a fresh server).

1 Like

Joe,

Gotcha.

Of course I understood much of this (even though I’m merely a software guy!) but you articulate it in a manner that makes it clear what the next steps are.

I am going to approach the hosting company. As I mentioned earlier, they are keen to be spam-free. Given this, I think that they will be amenable to my rebuilding the server on a fresh drive with the newest OS.

This sounds like something I would really like to do. I can research this, of course, but I’d like to ask you (and this forum), especially given the local expertise in Virtualmin:

If you were in my shoes—and wanted to track down the origin of the breach—what would be your first step? Or if not the origin, then how might I see a list of active processes to help me identify the bots/scripts that are generating spam.

Thanks again, for advising on this.

Kenneth

I’d look at the process list (using ps), and the biggest CPU/memory users (using top). If something is using a lot of CPU, see what user is doing it. That could be the compromised user (if a user is compromised, rather than root).

I’d check the maillog. A lot of hackers just want to be able to send spam. Maybe they do it through the local mail server (but maybe they do it some other way, which won’t show up in the maillog).

And I’d check the error_log and access_log for the compromised user. If the goal was to serve something from your webserver (like malware payloads), you might see it here.

Then, I might check network traffic for things that don’t belong using tshark. Malware often self-replicates, so in addition to programs to hide itself and reinstall itself if it gets deleted/stopped, they often include a scanner to look for other old servers that can be compromised and added to the botnet.

I’d look for hidden files and directories (files that start with .) in the compromised home directories. Something like find . -name ".*" /home/domainname can maybe do that (I’m not sure about the .* glob, it might need something more specific).

If you have not identified a compromised user after all of this, but you still think you’ve been compromised, it is pretty certainly a root-level exploit, and they may be able to completely hide themselves from you. The only way to see a root-level exploit with certainty is an outside observer. Either you boot from read-only media (like a rescue disk), and then mount up the system filesystem(s) read-only and poke around, or you sniff traffic from outside the server itself or both. Attackers want to do something with your system, so if you shut off all of the expected services but still have traffic, you may be able to at least prove it’s been compromised, but may not be able to determine how or to what extent (again, if they have root, you can’t trust the system).

I might, as a hail Mary, run rpm -Va >/root/package-verify.txt and look through that for evidence of changed system files. Rootkits change system files so they can hide themselves, and they usually don’t do it with packages. So, an otherwise invisibly rooted system might still show you some clues because system files like passwd, ps, top, etc. don’t match what’s in the package. If you did see that, it would confirm a root-level exploit, if a sloppy one.

You can try tools like chkrootkit. But, if the attacker has root, running it won’t prove you don’t have a rootkit, because they can hide themselves from tools that detect rootkits. You have to run those kinds of tools from read-only clean boot media (like a rescue disk) in order to have any faith in their results.

There are also tools for detecting WordPress malware, and the like. I have no experience with those.

That’s high level. You need to get deeper than that, though, and it’s well past being a Virtualmin issue.

Hi, Joe.

I just wanted to drop a line to let you know that I have been working through this with the host. I’ll definitely drop a line here with the resolution, once completed.

Thanks again for the troubleshooting list above. It’s extensive, detailed, and most of all it’s been very helpful to me in this situation. I think that any novice or intermediate (or beyond) Web tech should—and likely will—favourite this post.

Talk soon.

Kenneth

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.