load average very high with peaks, server far too slow

lex · November 15, 2010, 12:33pm

Hi, I don’t know where to look anymore. Sometimes, load on the server shoots up, right now it did to 40 and stays there for a while. Then, normally, it goes back to normal levels (2 or 3 or so)

I ‘did a netstat’ and I’ll attach it here.

It looks like there a lot of attempts to connect to dovecot.

If I ‘disable’ a certain ‘server’ (domain) on the server, then the load goes back to normal.

At least it seems it does, it’s hard to tell really as it’s not very predictable when the load will go up or down, so you never know for sure what caused the change.

I’ve blocked a few more ip addresses via iptables, but I’m not convinced that does the trick really.

My question is, what is the best way to troubleshoot this and make sure the server responds in a normal way, always?

Thanks,

lex

Locutus · November 15, 2010, 1:26pm

What “load” exactly are we talking about here? CPU? Network traffic? Connections per second? What unit exactly is “40” and “2 to 3”?

Eric · November 15, 2010, 3:13pm

I suspect the units he’s referring to are the output of “uptime”… so 40 is pretty high in that case.

When the load spikes, try running “top”… are there some processes that just seem to sit there at the top of the list, consuming a lot of resources?

Also, review the output of “mailq” – are there a lot of emails in the queue? A lot of emails sitting there could mean someone is making heavy use of a newsletter. It could also mean a spammer figured out how to send email through someones account So you’d want to make sure the mail in the queue seemed legitimate.

What also might be helpful is to run “ps auxw”, and attach that output here. That will show a list of running processes, and we might be able to figure out if any seem out of the ordinary.

-Eric

Locutus · November 15, 2010, 4:37pm

Oh hell, yeah I just read up on the meaning of the “load average” output. As far as I understand it, a value of “40” kinda means that the system way overloaded about 40 times. (“1” would mean that the CPU was busy working on one process all the time, while “2” means that while the CPU was loaded, another process was waiting to get CPU cycles all the time, and so on.)

And I agree, check the process list during “overload” times. atop is a nice alternative to top too, which has a somewhat better-arranged output, since it can be configured to only show processes that produce a notable load.

Some hints about atop: Press “i” to set a new refresh interval. “a” toggles between “show all process” and “show only those with load”. “t” triggers refresh manually. “?” shows a help screen.

lex · November 17, 2010, 11:18am

Hi and thanks for all your answers!

It’s with ‘top’ that I see that the load is high, and mostly a lot of ‘apache2’ show up. More than normally so to say.

I did as suggested, and will attach both ps auxw and mailq during a high peak just now. (however, I did so as well in my first post in this thread with netstat, but only see the file when I edit that message, not here in the thread… Might be me, but just wondering if you people can see the files I attach here.)

lex · November 17, 2010, 11:22am

I now see the attachments here, but not the one in the thread starter (but I do see it when I click the ‘edit’ button). So I’ll attach the netstat one here too.

Lots of imap and dovecot

If it’s a spam thing, how do I find out where it all happens and how to stop it?

These ‘peaks’ normally last a few minutes and then it all goes back to normal some how.

Eric · November 17, 2010, 2:31pm

Okay, so it looks like you have 800+ emails in your mail queue. I’d classify that as quite a bit

The next step is to determine if they’re legitimate or not. To do that, you can log into Virtualmin, and go into Webmin -> Servers -> Postfix -> Mail Queue, and click on some of the messages.

Does the message appear to be a newsletter or mailing from one of your users? Or does it look “spammy”?

-Eric

Locutus · November 17, 2010, 2:33pm

There’s a lot of connections to your web server. You might check your Apache logs for details, which URLs have been requested.

ronald · November 17, 2010, 4:20pm

By reading the thread I draw a conclusion that a user sends a newsletter and recipients vist his/her website.
If that conclusion is right or false needs to be seen.

If I send out a newsletter to 4500+ recipients, it is not even noticible…
What specs does the server have?

Locutus · November 17, 2010, 8:23pm

Indeed, a mailserver should not cause such a noticeable CPU load when sending out mails. The ps output implies that it’s Apache causing the high CPU and memory load. It seems like a lot of people are requesting web pages at the same time, which would also explain the high number of connections to port 80.

Hint: The atop application, in mode “p”, will show all resources used by one application accumulated over all processes.

Interesting is the high count of Dovecot processes doing “pop3-login” and “imap-login”. How many users does the server have that might concurrently be reading their mail?

lex · November 17, 2010, 11:36pm

mail: there are only a few sites on the server and they don’t use mail as such at all. Well, newsletters are sent out daily by 3 of the 5 sites: at 9:00 in the morning (2) and one at 15:00. But that’s all. But the site owners don’t use the domain for receiving mail. So yes, those pop3 logins and imap logins are worrying.

The server is old(ish), and I’m getting a new one. However, I don’t think it’s (just) that as it’s not newsletter related. If i send out a newsletter from my own domain on the server, to a bit over 10.000 people, (all personalized), then the load goes up, but not such as I’m showing here in these examples.

I’ll check the apache logs

I’ll check the mail too

Thanks people, your help is appreciated a lot!

Locutus · November 18, 2010, 1:04am

Actually, your netstat shows only three direct connections to IMAP, and none to POP3. Are you maybe running webmail via your Apache? That might explain the great lot of Apache connects in combination with the Dovecot processes.

If the problems with mail logins persist, you might turn on the Linux firewall and have it record a log of connections, in addition to what the other servers themselves log.

lex · November 18, 2010, 10:39am

Thanks for your post. I’ll check if I can find info on how to see if webmail is up and running and how to enable a linux firewall.

lex · November 18, 2010, 11:09am

I’ve got these servers up and running:

Apache Webserver
BIND DNS Server
MySQL Database Server
Postfix Mail Server
ProFTPD Server
Procmail Mail Filter
Read User Mail
SSH Server
SpamAssassin Mail Filter
Virtualmin Virtual Servers (GPL)
Webalizer Logfile Analysis

Didn’t see anything in postfix about some kind of webmail, so looking at this, I’d say I have no webmail up and running. But then, I might be totally wrong.

At SMTP Client Restriction postfix is ‘allowing all clients’. can that have anything to do with this?

About Linux Firewall: I’ve got iptables up and running. I’ll check about logging.

Locutus · November 18, 2010, 1:16pm

Didn’t see anything in postfix about some kind of webmail

Webmail isn’t something you configure (or see) in Postfix actually. It’s basically web code - based on PHP or Ruby or similar - that connects, as if it was a normal mail client, to the mail server via IMAP.

At SMTP Client Restriction postfix is ‘allowing all clients’. can that have anything to do with this?

Should not. That setting can be used to early-reject clients e.g. by IP range, or by DNS Blacklist or similar.

About Linux Firewall: I’ve got iptables up and running. I’ll check about logging.

Goodies. Check the Webmin iptables module, you can quite easily add a rule to log stuff there. Make sure to log only packets with the TCP SYN flag set, otherwise it will log ALL traffic (as in all packets), and not just those that initiate a new connection, which will quite probably flood your logfile.

lex · November 18, 2010, 3:42pm

Just a quicky:

if i didn’t install the iptable module in webmin, but just installed it on the server (using ssh), can I add the module to webmin and will it pick up what’s on the server already or will it mess up things?

Eric · November 18, 2010, 4:09pm

You should be able to setup firewalling in Webmin without an additional module… just take a peek in Webmin -> Networking -> Linux Firewall.

As an aside, if you haven’t already, I’d definitely take a look at some of those messages in your mail queue… if those are spam of some sort, you’d want to clear them out. You can do that from Webmin -> Servers -> Postfix -> Mail Queue.

If they’re spam, that may mean that a spammer is sending those though a security vulnerability in one of your web apps. And in that case, a firewall isn’t likely to help your issue, since all a firewall would see is incoming web traffic.

The key would be to determine what web app is the culprit, and fix the security issue.

When looking at messages in your mail queue, click the “View all headers” option on the right, and look for the “Received” header at the top. It should say what userid it was received from, if it was generated from your server. That should help you track down the culprit.

-Eric

Locutus · November 18, 2010, 5:12pm

I agree with Eric there… Make sure no spammer can abuse your server. Such a thing can result in quite some trouble if it persists.

The firewall suggestion was rather meant to log connections to syslog (which is one of iptables’ features) and find out which IP addresses connect to where, to have something to work with.

lex · November 19, 2010, 12:36am

Hmm, let’s see if i can explain what I’ve just seen.

So, a lot of these messages are messages of my server, saying

"This is the mail system at host server2.penghost.co.uk.

I’m sorry to have to inform you that your message could not
be delivered to one or more recipients. It’s attached below"

The original (attached) message is a, wait, example:

Received: from SQANHQY (unknown [221.207.145.66])by server2.penghost.co.uk (Postfix) with ESMTP id 6083644F2for ooi@deining.org; Wed, 17 Nov 2010 05:29:20 +0000 (GMT)
Received: from [221.207.145.66] (port=2479 helo=039)by smtp.secureserver.net with asmtp id 78851A-0009E4-01for ooi@deining.org; Wed, 17 Nov 2010 13:18:43 +0800
Message-ID: 1173CF2BB0AC47DA8C780D89A765B3D8@039
From: “Millie Proctor” currentsv70@stopbeingatool.com
To: ooi@deining.org

Now, ‘deining.org’ is only a parked domain really, pointing people to some group site somewhere (just like ning). Anyway, I checked the site and there’s nothing there to abuse really. (It’s not in virtualmin yet, maybe I should import it into virtua

In fact: mail isn’t used at all for that domain so I should ‘just’ switch it off somewhere really.

Others are like this:

"This is the mail system at host server2.penghost.co.uk.

I’m sorry to have to inform you that your message could not
be delivered to one or more recipients. It’s attached below.

For further assistance, please send mail to postmaster."

attached e-mail:
Received: by server2.penghost.co.uk (Postfix)id E1295429F; Wed, 17 Nov 2010 10:55:40 +0000 (GMT)
Delivered-To: gci@server2.penghost.co.uk
Received: from pc200912031808 (125-230-124-118.dynamic.hinet.net [125.230.124.118])by server2.penghost.co.uk (Postfix) with SMTP id 3A6F74279for lex@gran-canaria-info.com; Wed, 17 Nov 2010 10:55:39 +0000 (GMT)
Received: (qmail 9564 by uid 564); Wed, 17 Nov 2010 18:44:39 -0800
From: “Free ViagraAndCialis” nagoya3P956@wixgame.com
To: lex@gran-canaria-info.com

The ‘to’ one, that would be me.

So I guess, if I could switch off the server2.penghost.co.uk sending those “your message could not be delivered” messages, that would help already a bit, no?

Locutus · November 19, 2010, 10:05am

So those mails in your queue are basically “undeliverable” replies from the mail-daemon due to nonexistent local email addresses? The “From” lines in the bounces look like it was an attempt to send spam to them.

It’s a bit odd. Delivery attempts to unknown local addresses should be denied with a 550 error code during delivery, and not trigger a bounce-mail. You might want to check your /var/log/mail.log at the time when such a mail comes in, there might be hints there why that happens.