Postfix - Slow mail queue processing

chris1262 · June 15, 2010, 11:09am

Hello,

I have finally managed to set up my postfix mail server using Virtualmin on CentOS 5.5 Running on a Xen Virtual machine.

Postfix does appear to process mail correctly but messages take a long time (3-6 minutes) to process through the queue. If I try to flush the mail queue I receive the error message;

postqueue: fatal: Cannot flush mail queue - mail system is down

If I run the command postfix reload then the mail queue appears to clear and mail is delivered to the relevant inbox.

Below is the output of the postfix log when sending a test email and trying to flush the queue manually. I am new to postfix but I think these are the relevant sections of the maillog.

Jun 15 11:49:23 server postfix/smtpd[6645]: connect from mail-ww0-f47.google.com[74.125.82.47]
Jun 15 11:49:23 server postfix/smtpd[6645]: 1C07F1E3B4: client=mail-ww0-f47.google.com[74.125.82.47]
Jun 15 11:49:23 server postfix/cleanup[6649]: 1C07F1E3B4: message-id=
Jun 15 10:49:37 server postfix/postqueue[6707]: fatal: Cannot flush mail queue - mail system is down
Jun 15 11:49:54 server postfix/smtpd[6645]: disconnect from mail-ww0-f47.google.com[74.125.82.47]
Jun 15 10:52:45 server postfix/qmgr[17625]: 1C07F1E3B4: from=, size=4530, nrcpt=1 (queue active)
Jun 15 11:52:45 server postfix/local[9231]: 1C07F1E3B4: to=, orig_to=, relay=local, delay=202, delays=202/0.01/0/0.13, dsn=2.0.0, status=sent (delivered to command: /usr/bin/procmail-wrapper -o -a $DOMAIN -d $LOGNAME)
Jun 15 11:52:45 server postfix/cleanup[9232]: 4E64E1E3DB: message-id=
Jun 15 11:52:45 server postfix/local[9231]: 1C07F1E3B4: to=, orig_to=, relay=local, delay=202, delays=202/0.01/0/0.14, dsn=2.0.0, status=sent (forwarded as 4E64E1E3DB)
Jun 15 10:52:45 server postfix/qmgr[17625]: 1C07F1E3B4: removed
Jun 15 11:53:14 server postfix/anvil[6647]: statistics: max connection rate 1/60s for (smtp:74.125.82.47) at Jun 15 11:49:23
Jun 15 11:53:14 server postfix/anvil[6647]: statistics: max connection count 1 for (smtp:74.125.82.47) at Jun 15 11:49:23
Jun 15 11:53:14 server postfix/anvil[6647]: statistics: max cache size 1 at Jun 15 11:49:23

I am not running spam or virus filtering, could the delay be postfix trying to run the emails through ClamAV / Spamassasin and the requests timing out because the respective servers aren’t running?

Or is it more likely a DNS problem? however mail is reaching the server almost instantly, its just held up in the queue for a long time.

Is it normal for the time entries in the maillog to not appear in chronological order? I assume this is because the time is taken from different servers, i.e. remote and my server?

Thanks in advance for any insight into the above, please let me know if you need me to post any further logs or configuration files or need further information.

Kind regards,

Chris.

Eric · June 15, 2010, 2:09pm

Do you have the Spam and Virus Scanning features disables in Virtualmin?

That is, although the daemons aren’t running, is Virtualmin setup to not attempt to use them?

You can verify that in System Settings -> Features and Plugins.

Also, I’d be curious if any errors show up in the mail logs if you were to log in over SSH, and restart Postfix with “/etc/init.d/postfix restart”. Sometimes errors or warnings show up when it starts up rather than while it’s receiving email.

-Eric

chris1262 · June 15, 2010, 3:42pm

Thanks for your response Eric.

Spam and Virus filtering is disabled in Features & Plugins.
I did try (unsuccessfully) to offload virus scanning to another server by following these instructions;
http://www.virtualmin.com/book/export/html/31
I couldnt get this to work so disabled virus and spam scanning altogether (in System Settings -> Features and Plugins.)

Here are the log entries produced when restarting postfix with “/etc/init.d/postfix restart”

Jun 15 16:34:39 server postfix/postfix-script: stopping the Postfix mail system
Jun 15 16:34:39 server postfix/master[17623]: terminating on signal 15
Jun 15 16:34:39 server postfix/postfix-script: starting the Postfix mail system
Jun 15 16:34:39 server postfix/master[14533]: daemon started -- version 2.3.3, configuration /etc/postfix

Looks normal to my untrained eye.

Would posting my postfix main config file help?

Thanks,

Chris.

Eric · June 15, 2010, 3:45pm

Yeah, I don’t see anything wrong there.

How much RAM does your system have (you can see this by running “free -m”)? Also, what do you get when typing “uptime”?

-Eric

chris1262 · June 15, 2010, 4:01pm

It only has 512MB RAM hence I had issues running ClamAV and Spam assassin daemons

 free -m
             total       used       free     shared    buffers     cached
Mem:           501        427         74          0         15        115
-/+ buffers/cache:        296        205
Swap:            0          0          0

uptime
 16:54:49 up 49 days, 12:41,  1 user,  load average: 0.04, 0.03, 0.00

Load averages never run high so I dont think its due to an overloaded server.
After a restart the mail queue is still slow.
I have lived with this problem for some time now as you may have guessed by the uptime

Thanks for your continued efforts.

chris1262 · June 15, 2010, 4:03pm

I noticed the following in my maillog

Jun 15 11:32:58 server postfix/anvil[1245]: statistics: max connection rate 1/60s for (smtp:74.125.82.47) at Jun 15 11:29:08
Jun 15 11:32:58 server postfix/anvil[1245]: statistics: max connection count 1 for (smtp:74.125.82.47) at Jun 15 11:29:08
Jun 15 11:32:58 server postfix/anvil[1245]: statistics: max cache size 1 at Jun 15 11:29:08

It looks to me as some form of throttling, just thought it might be relevant?

chris1262 · July 16, 2010, 11:35am

I’m still having issues with slow mail processing.
I’m trying to intemperate the maillog to see where the delays are happening. I keep coming across the following in the logs:

delays=216/0.01/2.2/1

Can someone explain what each number means here, I assume its delays involved in each stage of processing mail but I dont understand what stage they relate to.

Thanks,

Chris.

chris1262 · July 16, 2010, 11:48am

Found the answer;

The format of the “delays=a/b/c/d” logging is as follows:

* a = time from message arrival to last active queue entry
* b = time from last active queue entry to connection setup
* c = time in connection setup, including DNS, EHLO and STARTTLS
* d = time in message transmission

Now to figure out what “last active queue entry” means…

Eric · July 16, 2010, 1:49pm

Howdy,

What does “postconf -n” show on your system?

Also, how many messages are in your mail queue at any given time? You can see how many are in there now by running “mailq | tail -1”.

-Eric

chris1262 · July 16, 2010, 2:23pm

Hi Eric,

Thanks for your ongoing support.

The problem was in the master.cf file I have managed to solve it by copying the master.cf from my backup server and configuring things from scratch.
Im not 100% sure where the error was but the problem is gone and now mail processing is lightning fast

Sorry I cant provide more details that may help someone in the same situation.

Thanks again,

Chris.