Postfix - 30s delay in sending emails

Dibs · June 9, 2020, 5:22pm

On my new host, if I send an email using Roundcube - the “Sending message” pop up stays for 30s and then says “Message sent sucessfully”. This is consistent.

Even using telnet to port 25 - it takes 30s or so to present the SMTP banner.

Operating system- Ubuntu Linux 18.04.4
Perl version - 5.026001
BIND version - 9.11 - disabled and not running
Postfix version - 3.3.0
Apache version - 2.4.29
PHP versions - 7.2.24
MySQL version - 5.7.30-0ubuntu0.18.04.1
Roundcube - 1.4.5

Going thru the Postfix configuration between this host and the old one (which sends instantly and presents the SMTP banner almost instantly) - nothing jumps out that’s different.

Both hosts use spamc instead of the standalone spamassassin.

Any thoughts or pointers?

Thanks in advance.

Dibs

Joe · June 9, 2020, 6:23pm

Sounds like one of your DNS servers is timing out. You didn’t restart postfix after stopping local BIND and removing it from resolv.conf.

Dibs · June 9, 2020, 6:34pm

@joe - thanks for the reply.

I’ve reloaded Postfix & restarted it a few times since altering resolv.conf (removing 127.0.0.1 and adding external ones in - same as my other working host).

Even telnet’ing to the FQDN or the IP on port 25 - still takes about 30s for the 220 banner to appear.

I do wonder if the 30s for the banner to appear and the 30s it takes for Roundcube to send are related.

Any thoughts\suggestions?

Thanks

Dibs

Joe · June 9, 2020, 6:36pm

They’re the same problem.

One of your DNS servers is slow to respond to reverse (PTR) record requests (or you don’t have correct PTR records for your client and server).

Dibs · June 9, 2020, 6:50pm

Server & client (broadband connection at home) both have the correct PTR records and revolve both ways. Checked it with nslookup.

DNS servers in resolv.conf - they are the same as the working host (which has no issues and telnet’ing to the FQDN\IP on port 25, the banner 220 response is instant). Don’t really know what else to look at.

Thanks

Dibs

Joe · June 9, 2020, 7:29pm

I dunno.

Sounds like a DNS timeout, to me. I can’t think of any other reasons for a 30 second timeout.

Dibs · June 9, 2020, 7:43pm

@joe - thanks for the reply.

I’ve gone looking round in a few places - /etc/resolv.conf & in /var/spool/postfix/etc/resolv.conf - and they just had the 127.0.0.1 entry, not the multiple nameserver entries that the working server has and the ones I’d entered in Webmin >> Network Configuration >> Hostname & DNS Client.

So altered the pair to be what I think they should be (the same as the working server), saved them and restarted Postfix.

Now I constantly get a

postfix/postfix-script[16785]: warning: /var/spool/postfix/etc/resolv.conf and /etc/resolv.conf differ

message in /var/log/mail.log even tho both files are identical. Any suggestions?

Thanks

Dibs

Joe · June 9, 2020, 7:59pm

You’re running postfix chrooted. I don’t remember how to make it update its chroot, as I don’t run it in a chroot (and I seem to recall the author not recommending doing so), and we don’t normally set it up that way unless it’s already chrooted when the install runs.

Dibs · June 9, 2020, 8:04pm

@joe - thanks for the reply.

In all fairness - I just ran the installer for Virtualmin as in the docs. Will have to Google about the chroot. Thanks for the suggestion on where to look.

Thanks

Dibs

Joe · June 9, 2020, 8:12pm

It’s possible that the default package option is to chroot it, but I don’t think that’s the case. I feel like it requires being told to do so. But, if postfix was already pre-installed on your VM or something it could have been chrooted. I think Virtualmin/Webmin can work with it in a chroot, but it’s unnecessary complexity for questionable benefits (chroot theoretically improves security, but it potentially opens up new attack vectors, so it’s usually a wash…and the added complexity is a cost, too).

Dibs · June 9, 2020, 8:26pm

@joe - thanks for the reply.

At the moment I’m just trying to get /etc/resolv.conf to stick. By that I mean, add the entries for the external name servers, removing the 127.0.0.53 entry and have it persist over a reboot.

The VPS is a Linode one and I’ve disabled their Network Helper which modifies that file.

The changes aren’t persisting no matter what I do (at the moment).

Thanks

Dibs

Dibs · June 9, 2020, 8:57pm

Just realised on Ubuntu 18.04 etc/resolve.conf isn’t “used” or updated. Netplan seems to be used.

So a step back in trying to figure out why I’m getting a 30s delay in emails sending and the 220 banner response when telnet’ing into port 25 on the server.

Dibs

Dibs · June 9, 2020, 9:13pm

It’s chroot’d by default. A quick look in /etc/postfix.master.cf (on new problematic server and the old fine one) shows smtp with a “y” under the Chroot column.

Not any wiser as to the 30s delay but still learning. LOL

Dibs

p.s. During the install (via the script in the docs - wasn’t asked if I wanted it chroot’d. Or don’t recall so.)

Joe · June 9, 2020, 9:24pm

We’re doing test installs on 20.04 this week, so I’ll try to remember to have a look at whether it defaults to chroot. If it is, I probably want to unchroot it. It’s a source of confusion for a lot of users and for no real benefit. It is definitely not chrooted by default on CentOS (the default I’m talking about would be a package decision, nothing to do with Virtualmin, as we don’t provide the postfix package).

Unless gethostbyname is netplan-aware I’m pretty sure resolv.conf is still how software figures out how to resolve domain names. Netplan is a network configuration tool, it is not replacing the whole Linux network stack.

Nonetheless, your problem remains DNS isn’t configured right.

Dibs · June 9, 2020, 9:43pm

@joe - thanks for the reply.

From Googling it appears Ubuntu 18.04 has its resolv.conf sym-linked to a stub file that points to the localhost for name resolution.

ls -la /etc/resolv.conf returns

lrwxrwxrwx 1 root root 39 May 27 21:37 /etc/resolv.conf → …/run/systemd/resolve/stub-resolv.conf

and

cat /run/systemd/resolve/stub-resolv.conf returns

… removed comments…
nameserver 127.0.0.53
options edns0
search members.linode.com

so if postfix copies the /etc/resolv.conf it should get

nameserver 151.236.220.5
nameserver 178.79.182.5
nameserver 176.58.116.5
search members.linode.com

but looking at /var/spool/postfix/etc/resolv.conf

that has

nameserver 127.0.0.53
options edns0
search members.linode.com

which I’m not sure is correct, but could easily be wrong.

Any thoughts on what to check - to work out if\how DNS isn’t configured correctly?

Thanks

Dibs

Dibs · June 9, 2020, 9:47pm

Not sure if I’m making sense in that last post. LOL

On the learning curve with Ubuntu 18.04 - and it’s a little steep\confusing at the moment.

Dibs · June 9, 2020, 10:29pm

Solved (I think).

Thankfully I have an existing (working) server to compare with.

Going thru /etc/postfix/main.cf and /etc/postfix/master.cf there wasn’t much different between them.

Main.cf (new server) had the additional entries for milter - DKIM is enabled on the server - whilst the existing one didn’t.

The one key difference between them in main.cf is

New Server - inet_protocols = all
Old server - inet_protocols = ipv4

So I changed the new server to inet_protocols = ipv4.

Reloaded the config and stopped and started postfix for good measure - nothing amiss in the mail.log, so sent an email from new server to old server - it spat an error out about authentication.

Jun 9 23:11:50 host2 postfix/smtpd[12610]: warning: SASL authentication failure: cannot connect to saslauthd server: No such file or directory
Jun 9 23:11:50 host2 postfix/smtpd[12610]: warning: localhost[127.0.0.1]: SASL LOGIN authentication failed: generic failure

So set the chroot option back from n to y in master.cf (new server) - I has changed it in an attempt to get it out of the chroot.

Reloaded config and stopped & started postfix for good measure - nothing amiss during startup in mail.log.

So opened up a telnet session on port 25 (new server) and the 220 banner came back instantly. Held back the urge to celebrate and opened up Roundcube (new server) and sent an email to an account on the old server - and YES it went instantly.

I suspect Postfix was trying a DNS or other network related query on IPv6 but IPv6 is disabled.

So on with setting DMARC tomorrow and then moving domains\virtual servers from old server. Probably more problems waiting there. LOL