Dovecot Failed state

That damned initscript keeps bugging me. And, I think I might know what’s going on.

Move /etc/init.d/dovecot out of /etc/init.d and also make sure Webmin is defaulting to using systemd as the init system, rather than initscripts (in Bootup and Shutdown module, look at the top of the page under the title, it’ll say “Boot system: systemd” or something similar).

I suspect there is some edge case where the existence of two services for dovecot is causing a fight. They can’t both be running…they’d be using all the same resources (ports, sockets), and if something were trying to use the wrong one to start/stop/restart dovecot, it would cause exactly the errors you saw, I’m pretty sure.

Hi Joe,
I have renamed /etc/init.d/dovecot to /backup-dovecot

a quick check of webmin>system>bootup and shutdown shows Boot system: Systemd

I also made a change in my email client desktop pc app. This may have nothing to do with anything but it is something i changed around the time we started having problems with dovecot…so i changed it back.

in my desktop pc email client app, for the domain on the virtual server where this problem seems to originate, i recently changed my incoming mail server to mail.mydomain.com (this was about a week ago).

Prior to this, i had always set my incoming mail to server1.mydomain.com

I have changed it back to the way it was before and asked one of my clients to do the same with theirs.

  • i dont know if that dns entry has even remotely anything to do with the issue, however i dont use any dns records that resolve mail.domain.com to server1.mydomain.com for any of my clients virtual servers. So i figure, maybe dovecot might also be grumbling about dns resolution for mail.domain.com when that dns record doesnt exist? (got to try to simplify anything that might not be quite right in dns i figure)*

I will post back either when the system fails again, or in a few days if it doesnt.

Two things i am dreading however,

  1. what happens when a new webmin update comes out…that almost always seems to cause my system to go haywire
  2. the next time an SSL certificate for a domain on my system updates

If i am going to have further problems, then the above will be the test. Lets wait and see.

thanks for your help so far…i wish we had more concrete evidence of the solution.

Dovecot is not grumbling about name service. Only Postfix would perform lookups like that. That’d be a different issue. Failing DNS lookups can’t possibly cause Dovecot to shutdown (or anything else, really…Dovecot doesn’t really interact with the name server). A name service issue could only impact deliverability, it could not cause stability issues under any circumstances. Forget about name service errors in this conversation; they are not relevant to this problem.

Webmin updates would not have any effect. Virtualmin updates could, and recently did. Webmin and Virtualmin are provided in separate packages (Virtualmin is a Webmin module, but it is maintained in a bunch of separate packages).

We’ve already discussed the cert issues that were present in the Pro version 6.09-2 and 6.09-2 packages on several threads (though I think those issues only impaced Postfix…this is probably not directly related to that, but I suspect the panic that ensued when things broke led to Dovecot getting some unneeded restarts and service changes; perhaps you enabled the Dovecot initscript in the midst of all this). Hopefully, that kind of thing won’t happen again, though maybe bugs will happen again.

That is, again, the bug that was present in 6.09-1 and 6.09-2 and fixed in 6.09-3. It should not recur, though it’s possible there will be other bugs that show up (the issues were related to a major overhaul of cert handling for Postfix). Unless you plan to install 6.09-1 or 6.09-2 again, you won’t see that specific problem again.

Just don’t poke the bear. I don’t think problems will recur. Everything looked fine when I looked at your server (but that initscript could cause trouble if it was trying to start in addition to the systemd unit).

1 Like

will see how things go. thanks for your help Joe.:+1:

agh…its done it again.
it has been fine for days and suddenly bang its down again…

Failed to start Dovecot :

Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login

● dovecot.service - Dovecot IMAP/POP3 email server
Loaded: loaded (/lib/systemd/system/dovecot.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2020-05-20 11:44:29 AEST; 2h 23min ago
Docs: man:dovecot(1)
http://wiki2.dovecot.org/
Main PID: 762 (code=exited, status=0/SUCCESS)

Do the following mean anything specific…( i have substituted tesla for my domain)

May 20 11:44:29 server1 dovecot: imap-login: Error: read(anvil) failed: EOF

May 20 11:44:29 server1.tesla.com.au dovecot[769]: imap: Warning: Killed with signal 15 (by pid=1 uid=0 code=kill)

EDIT…
I am finding more of these anvil errors…

dovecot: anvil: Error: connect limit: disconnection for unknown pid 7074
dovecot: anvil: Error: connect limit: disconnection for unknown pid 9749
dovecot: anvil: Error: connect limit: disconnection for unknown pid 4395
dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
dovecot: master: Error: unlink(/var/run/dovecot/master.pid) failed: No such file or directory (in main.c:558)

what does the “connect limit” in the anvil error above mean?

Joe i havea an update this morning…this may be of help to you guys in fixing this issue.

When i woke up, again the dovecot status monitor and server monitor in virtualmin were saying davecot is not running.
i went into command shell and enterred systemctl status dovecot (and also dovecot.service).

Dovecot IMAP/POP3 email server
Loaded: loaded (/lib/systemd/system/dovecot.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2020-05-21 01:19:34 AEST; 5h 35min ago
Docs: man:dovecot(1)
http://wiki2.dovecot.org/
Process: 26345 ExecStop=/usr/bin/doveadm stop (code=exited, status=75)
Process: 579 ExecStart=/usr/sbin/dovecot (code=exited, status=0/SUCCESS)
Main PID: 719 (code=exited, status=0/SUCCESS)

All 3 of the usual methods of system monitoring say that dovecot and dovecot service have enterred a failed state. However, the email delivery is still being completed. I logged into 3 client accounts in usermin and the email to these accounts via dovecot is still being delivered.

I have checked to see if email client apps such as outlook are able to connect with the server, at least the one for my own email account on one of the virtual servers is receiving and sending emails successfully…i cant be sure about my clients own ones on their computers and mobile phones

So my questions

  1. is dovecot running or not?

  2. If it is, how can command shell and virtualmin both be saying dovecot is not running when it clearly is still delivering mail?

Are you sure that this problem isnt something to do with resources or limits that are set in dovecot when virtualmin is installed using the installer? Is there a way i can increase the limits that pertain to login/login error i am seeing all the time?

I havea further update joe,

since the latest “dovecot has enterred a failed state error began”, all email accounts for at least one virtual server are now reciveing duplicate emails every time one is sent.

i have checked in usermin as well…its happening for both usermine and desktop pc email clients even my own account for that virtual server.

i note that one copy has the date, the other just the time in Usermin.

this is happening across virtual servers and its from both internal email accounts i host on the server, and external hotmail email senders.

please note, last night i setup a secondary mail server…which i am not even sure if its actually working (because i dont host dns…that is done at registrar and i ahvee not added any secondary mx records). Is it possible that if bind is running on either virtualmin system that these duplicate emails can still resolve and be sent from secondary too? Could that be the problem?How do i check this?

Dovecot does not deliver mail (that’s SMTP, though Procmail could also duplicate mail, if you configured it to do so for some reason). It can’t be involved in this problem. It is not related to Dovecot.

Sending mail does no involve Dovecot. Sending mail always goes through Postfix, whether it is sent from a local client (e.g. a web app that sends mail via PHP mail API or something calling the sendmail command or other CLI mail send tool) or a remote client (e.g. Thunderbird or Outlook or your mobile mail client). Dovecot never has anything to do with sending mail. Dovecot is only for retrieving already delivered mail using POP3/IMAP protocols. It also has a delivery agent, but we don’t use it, we use Procmail for historical reasons, so in a Virtualmin system, Dovecot also has nothing to do with mail delivery. It only retrieves mail on behalf of mail clients via POP3/IMAP.

Usermin can use POP3/IMAP (which involves Dovecot) or its own local mail reading functions (which does not involve Dovecot).

I’m just trying to get some clarity here. There are so many topics being discussed here in this thread, I can’t figure out what we’re trying to solve. Not all email problems are related to Dovecot (in fact, most are not).

This might be a bug in Usermin, though I don’t think I’ve seen it.

What does DNS have to do with this? I mean, you’d need MX and A records pointing to the new server to make it functional but it doesn’t matter if DNS runs on the server(s) themselves.

Let’s stick to one problem on this thread. Double emails is not related to Dovecot. Secondary server is not related to Dovecot. DNS, still not related to Dovecot. :wink:

So, the problem we’re still tackling is why Dovecot is stopping, so let’s focus on that.

The only thing I see above that indicates why Dovecot is shutting down is that root told it to shutdown, presumably when you restarted it. So, we’re still no further along in seeing why it’s entering a failed state…but, if you’re still able to retrieve mail with mail clients (Thunderbird, whatever, doesn’t matter, though Usermin could be using direct access and may not be a useful test as it may not be using IMAP to retrieve mail, I don’t remember what the current default is).

I’m gonna try logging into your system again to see if I can find some clues in the log. So far, I’m not seeing any indication that Dovecot thinks something is wrong…except when it tries to restart and finds itself already running (this seems to be maybe systemd thinking it’s not running and so it doesn’t shut it down properly before starting it again…I’m not sure).

Oddly, maybe the double emails is a dovecot problem. https://dovecot.org/list/dovecot/2016-July/104881.html

The anvil PID errors are a Dovecot bug for sure, but the Dovecot authors say they are harmless.

joe i think that as you said a few days ago, perhaps the dovecot errors are not consistent with :Dovecot not being able to deliver emails.
I have left the dovecot server in a failed stated now for 2 days and my clients emails are being delivered anyway.

I was able to stop the double emails…the way i did it makes zero sense to me, however, i will adress that in the appropriate thread its not relevant here.

BTW, i have a fresh debian 10 Virtualmin GPL install (run from the virtualmin auto installer)…have a guess what its current Dovecot state is?

May 15 12:15:13 server2 dovecot: auth: Fatal: master: service(auth): child 9760 killed with signal 9
May 15 12:15:25 server2 dovecot: anvil: Fatal: master: service(anvil): child 3645 killed with signal 9
May 15 12:15:38 server2 dovecot: stats: Fatal: master: service(stats): child 3658 killed with signal 9
May 15 12:15:50 server2 dovecot: anvil: Fatal: master: service(anvil): child 11341 killed with signal 9
May 15 12:16:05 server2 dovecot: anvil: Fatal: master: service(anvil): child 11538 killed with signal 9
May 15 12:16:23 server2 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login

Hi guys, this is almost certainly not related; however, since this dovecot thread is active, just to quickly let you know that I have a reproduceable issue with dovecot failing when a VirtualMin virtual server with an SSL certificate is deleted from the system. (I’m using Let’s Encrypt certificates via normal VirtualMin admin). Here’s the note I made to myself describing the issue and the cause of it:

–8<–

When you delete a virtual host, the SSL certificate entry for it is erroneously left in /etc/dovecot/dovecot.conf.

When dovecot next restarts, it doesn’t find the certificate and thus it won’t restart and goes into a failed state.

To fix, edit dovecot.conf and remove the rogue entry then restart manually.

Sample error (as seen in journalctl -xe):

Apr 21 11:13:13 xyz.com dovecot[47661]: doveconf: Fatal: Error in configuration file /etc/dovecot/dovecot.conf line 487: ssl_cert: Can’t open file /home/XYZ/ssl.cert: No such file or directory

– 8< –

Should be an easy one to fix?
Cheers!

Yes, had that one too, especially when you reboot some time later and have to search why dovecot isn’t running …

That is a sifferent issue to mine, as your log file says…its missing the file.
In my case, the system status monitor is reporting that dovecot is not running . However email delivery is fine and dovecot appears to be delivering emails. Its a bug that no one seems to be able to fix.

Guys i am still ahving this problem with dovecot enterring a failed state. It has been in this status now for a few weeks. Here is a recent mail.err log for the system;

can someone got through this error log and explain to me exactly why these particular errors are appearing.
(the ipaddress is left as is in the error log…it is not a known ipaddress to me)

May 20 14:16:02 server1 dovecot: anvil: Fatal: master: service(anvil): child 1941 killed with signal 9
May 20 14:16:12 server1 dovecot: imap-login: Fatal: master: service(imap-login): child 25826 killed with signal 9
May 20 14:16:12 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 25828 + ident imap/103.129.156.70/operations@domain1.com
May 20 14:16:20 server1 dovecot: auth: Error: net_connect_unix(anvil-auth-penalty) failed: Permission denied
May 20 14:16:20 server1 dovecot: auth: Error: auth worker: Aborted PASSV request for operations@domain1.com: Shutting down
May 20 14:16:26 server1 dovecot: imap-login: Fatal: master: service(imap-login): child 4394 killed with signal 9
May 20 14:16:26 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 4395 + ident imap/52.125.136.93/contact@domain3.com
May 20 14:16:37 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 9749 + ident imap/52.125.130.11/secretary@domain1.com
May 20 14:16:37 server1 dovecot: imap-login: Fatal: master: service(imap-login): child 9748 killed with signal 9
May 20 14:16:45 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 28250 + ident imap/52.125.136.93/contact@domain3.com
May 20 14:17:18 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 20 14:18:06 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 7074 + ident imap/52.125.130.11/secretary@domain1.com
May 20 14:19:01 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 15213 + ident imap/120.146.145.157/adamjedgar@domain2.com
May 20 14:19:16 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 25509 + ident imap/1.129.106.62/contact@domain3.com
May 20 16:34:31 server1 dovecot: master: Error: unlink(/var/run/dovecot/master.pid) failed: No such file or directory (in main.c:558)
May 21 01:19:36 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 21 01:19:38 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 21 01:19:40 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 21 01:19:42 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 21 06:33:31 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login

These are the same as earlier. And, unfortunately, I still don’t know why it’s stopping.

Something told Dovecot to stop. Signal 9 is an aggressive kill command (kill -9), I don’t think it happens with a systemctl restart…I think that’d be a signal 15 so it could clean up. But the following messages:

Indicate that it did not shutdown, even with kill 9. Which is hard to understand. The kernel should literally yank everything out from under it with a kill 9, unless something is going wrong at the kernel level. Are you running the latest kernel available for your distro?

I think you’ve already said so earlier, but can you confirm that a reboot brings dovecot back up and it behaves normally for some time? If so, that makes me think there may be a hardware or kernel problem. This is a weird set of symptoms for that, though; normally I would expect many services to act weird.

But, honestly, we’re just rehashing the same errors here. These are the same basic log messages we’ve been seeing all along. Dovecot is told to shutdown and restart; it tries but fails to shutdown and thus can’t restart because the old process is still hanging on to sockets, etc.

I’ve never seen this behavior from Dovecot, so I don’t have any hidden wisdom here. Maybe check dmesg for kernel errors indicating memory or disk faults. I’m guessing wildly because I’d like to help, but I just don’t see any reason for it to be failing seemingly at random.

hi Joe,

:~#uname -r

4.9.0-12-amd64

I havent a clue how i am supposed to tell whether or not this is the latest release for debian 9. When i go to the website, is says latest is 9.12. It doesnt even line up with what i have???

i have always just run any upudates that come through the virtualmin interface. Have i missed something?

:~# dmesg | grep Linux
[ 0.000000] Linux version 4.9.0-12-amd64 (debian-kernel@lists.debian.org) (gc c version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.210-1 (202 0-01-20)
[ 0.700593] Linux agpgart interface v0.103
[ 0.790666] usb usb1: Manufacturer: Linux 4.9.0-12-amd64 uhci_hcd