Dovecot Failed state

will see how things go. thanks for your help Joe.:+1:

agh…its done it again.
it has been fine for days and suddenly bang its down again…

Failed to start Dovecot :

Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login

● dovecot.service - Dovecot IMAP/POP3 email server
Loaded: loaded (/lib/systemd/system/dovecot.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2020-05-20 11:44:29 AEST; 2h 23min ago
Docs: man:dovecot(1)
http://wiki2.dovecot.org/
Main PID: 762 (code=exited, status=0/SUCCESS)

Do the following mean anything specific…( i have substituted tesla for my domain)

May 20 11:44:29 server1 dovecot: imap-login: Error: read(anvil) failed: EOF

May 20 11:44:29 server1.tesla.com.au dovecot[769]: imap: Warning: Killed with signal 15 (by pid=1 uid=0 code=kill)

EDIT…
I am finding more of these anvil errors…

dovecot: anvil: Error: connect limit: disconnection for unknown pid 7074
dovecot: anvil: Error: connect limit: disconnection for unknown pid 9749
dovecot: anvil: Error: connect limit: disconnection for unknown pid 4395
dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
dovecot: master: Error: unlink(/var/run/dovecot/master.pid) failed: No such file or directory (in main.c:558)

what does the “connect limit” in the anvil error above mean?

Joe i havea an update this morning…this may be of help to you guys in fixing this issue.

When i woke up, again the dovecot status monitor and server monitor in virtualmin were saying davecot is not running.
i went into command shell and enterred systemctl status dovecot (and also dovecot.service).

Dovecot IMAP/POP3 email server
Loaded: loaded (/lib/systemd/system/dovecot.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2020-05-21 01:19:34 AEST; 5h 35min ago
Docs: man:dovecot(1)
http://wiki2.dovecot.org/
Process: 26345 ExecStop=/usr/bin/doveadm stop (code=exited, status=75)
Process: 579 ExecStart=/usr/sbin/dovecot (code=exited, status=0/SUCCESS)
Main PID: 719 (code=exited, status=0/SUCCESS)

All 3 of the usual methods of system monitoring say that dovecot and dovecot service have enterred a failed state. However, the email delivery is still being completed. I logged into 3 client accounts in usermin and the email to these accounts via dovecot is still being delivered.

I have checked to see if email client apps such as outlook are able to connect with the server, at least the one for my own email account on one of the virtual servers is receiving and sending emails successfully…i cant be sure about my clients own ones on their computers and mobile phones

So my questions

  1. is dovecot running or not?

  2. If it is, how can command shell and virtualmin both be saying dovecot is not running when it clearly is still delivering mail?

Are you sure that this problem isnt something to do with resources or limits that are set in dovecot when virtualmin is installed using the installer? Is there a way i can increase the limits that pertain to login/login error i am seeing all the time?

I havea further update joe,

since the latest “dovecot has enterred a failed state error began”, all email accounts for at least one virtual server are now reciveing duplicate emails every time one is sent.

i have checked in usermin as well…its happening for both usermine and desktop pc email clients even my own account for that virtual server.

i note that one copy has the date, the other just the time in Usermin.

this is happening across virtual servers and its from both internal email accounts i host on the server, and external hotmail email senders.

please note, last night i setup a secondary mail server…which i am not even sure if its actually working (because i dont host dns…that is done at registrar and i ahvee not added any secondary mx records). Is it possible that if bind is running on either virtualmin system that these duplicate emails can still resolve and be sent from secondary too? Could that be the problem?How do i check this?

Dovecot does not deliver mail (that’s SMTP, though Procmail could also duplicate mail, if you configured it to do so for some reason). It can’t be involved in this problem. It is not related to Dovecot.

Sending mail does no involve Dovecot. Sending mail always goes through Postfix, whether it is sent from a local client (e.g. a web app that sends mail via PHP mail API or something calling the sendmail command or other CLI mail send tool) or a remote client (e.g. Thunderbird or Outlook or your mobile mail client). Dovecot never has anything to do with sending mail. Dovecot is only for retrieving already delivered mail using POP3/IMAP protocols. It also has a delivery agent, but we don’t use it, we use Procmail for historical reasons, so in a Virtualmin system, Dovecot also has nothing to do with mail delivery. It only retrieves mail on behalf of mail clients via POP3/IMAP.

Usermin can use POP3/IMAP (which involves Dovecot) or its own local mail reading functions (which does not involve Dovecot).

I’m just trying to get some clarity here. There are so many topics being discussed here in this thread, I can’t figure out what we’re trying to solve. Not all email problems are related to Dovecot (in fact, most are not).

This might be a bug in Usermin, though I don’t think I’ve seen it.

What does DNS have to do with this? I mean, you’d need MX and A records pointing to the new server to make it functional but it doesn’t matter if DNS runs on the server(s) themselves.

Let’s stick to one problem on this thread. Double emails is not related to Dovecot. Secondary server is not related to Dovecot. DNS, still not related to Dovecot. :wink:

So, the problem we’re still tackling is why Dovecot is stopping, so let’s focus on that.

The only thing I see above that indicates why Dovecot is shutting down is that root told it to shutdown, presumably when you restarted it. So, we’re still no further along in seeing why it’s entering a failed state…but, if you’re still able to retrieve mail with mail clients (Thunderbird, whatever, doesn’t matter, though Usermin could be using direct access and may not be a useful test as it may not be using IMAP to retrieve mail, I don’t remember what the current default is).

I’m gonna try logging into your system again to see if I can find some clues in the log. So far, I’m not seeing any indication that Dovecot thinks something is wrong…except when it tries to restart and finds itself already running (this seems to be maybe systemd thinking it’s not running and so it doesn’t shut it down properly before starting it again…I’m not sure).

Oddly, maybe the double emails is a dovecot problem. https://dovecot.org/list/dovecot/2016-July/104881.html

The anvil PID errors are a Dovecot bug for sure, but the Dovecot authors say they are harmless.

joe i think that as you said a few days ago, perhaps the dovecot errors are not consistent with :Dovecot not being able to deliver emails.
I have left the dovecot server in a failed stated now for 2 days and my clients emails are being delivered anyway.

I was able to stop the double emails…the way i did it makes zero sense to me, however, i will adress that in the appropriate thread its not relevant here.

BTW, i have a fresh debian 10 Virtualmin GPL install (run from the virtualmin auto installer)…have a guess what its current Dovecot state is?

May 15 12:15:13 server2 dovecot: auth: Fatal: master: service(auth): child 9760 killed with signal 9
May 15 12:15:25 server2 dovecot: anvil: Fatal: master: service(anvil): child 3645 killed with signal 9
May 15 12:15:38 server2 dovecot: stats: Fatal: master: service(stats): child 3658 killed with signal 9
May 15 12:15:50 server2 dovecot: anvil: Fatal: master: service(anvil): child 11341 killed with signal 9
May 15 12:16:05 server2 dovecot: anvil: Fatal: master: service(anvil): child 11538 killed with signal 9
May 15 12:16:23 server2 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login

Hi guys, this is almost certainly not related; however, since this dovecot thread is active, just to quickly let you know that I have a reproduceable issue with dovecot failing when a VirtualMin virtual server with an SSL certificate is deleted from the system. (I’m using Let’s Encrypt certificates via normal VirtualMin admin). Here’s the note I made to myself describing the issue and the cause of it:

–8<–

When you delete a virtual host, the SSL certificate entry for it is erroneously left in /etc/dovecot/dovecot.conf.

When dovecot next restarts, it doesn’t find the certificate and thus it won’t restart and goes into a failed state.

To fix, edit dovecot.conf and remove the rogue entry then restart manually.

Sample error (as seen in journalctl -xe):

Apr 21 11:13:13 xyz.com dovecot[47661]: doveconf: Fatal: Error in configuration file /etc/dovecot/dovecot.conf line 487: ssl_cert: Can’t open file /home/XYZ/ssl.cert: No such file or directory

– 8< –

Should be an easy one to fix?
Cheers!

Yes, had that one too, especially when you reboot some time later and have to search why dovecot isn’t running …

That is a sifferent issue to mine, as your log file says…its missing the file.
In my case, the system status monitor is reporting that dovecot is not running . However email delivery is fine and dovecot appears to be delivering emails. Its a bug that no one seems to be able to fix.

Guys i am still ahving this problem with dovecot enterring a failed state. It has been in this status now for a few weeks. Here is a recent mail.err log for the system;

can someone got through this error log and explain to me exactly why these particular errors are appearing.
(the ipaddress is left as is in the error log…it is not a known ipaddress to me)

May 20 14:16:02 server1 dovecot: anvil: Fatal: master: service(anvil): child 1941 killed with signal 9
May 20 14:16:12 server1 dovecot: imap-login: Fatal: master: service(imap-login): child 25826 killed with signal 9
May 20 14:16:12 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 25828 + ident imap/103.129.156.70/operations@domain1.com
May 20 14:16:20 server1 dovecot: auth: Error: net_connect_unix(anvil-auth-penalty) failed: Permission denied
May 20 14:16:20 server1 dovecot: auth: Error: auth worker: Aborted PASSV request for operations@domain1.com: Shutting down
May 20 14:16:26 server1 dovecot: imap-login: Fatal: master: service(imap-login): child 4394 killed with signal 9
May 20 14:16:26 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 4395 + ident imap/52.125.136.93/contact@domain3.com
May 20 14:16:37 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 9749 + ident imap/52.125.130.11/secretary@domain1.com
May 20 14:16:37 server1 dovecot: imap-login: Fatal: master: service(imap-login): child 9748 killed with signal 9
May 20 14:16:45 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 28250 + ident imap/52.125.136.93/contact@domain3.com
May 20 14:17:18 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 20 14:18:06 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 7074 + ident imap/52.125.130.11/secretary@domain1.com
May 20 14:19:01 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 15213 + ident imap/120.146.145.157/adamjedgar@domain2.com
May 20 14:19:16 server1 dovecot: anvil: Error: connect limit: disconnection for unknown pid 25509 + ident imap/1.129.106.62/contact@domain3.com
May 20 16:34:31 server1 dovecot: master: Error: unlink(/var/run/dovecot/master.pid) failed: No such file or directory (in main.c:558)
May 21 01:19:36 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 21 01:19:38 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 21 01:19:40 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 21 01:19:42 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login
May 21 06:33:31 server1 dovecot: master: Fatal: Dovecot is already running? Socket already exists: /var/run/dovecot/login/login

These are the same as earlier. And, unfortunately, I still don’t know why it’s stopping.

Something told Dovecot to stop. Signal 9 is an aggressive kill command (kill -9), I don’t think it happens with a systemctl restart…I think that’d be a signal 15 so it could clean up. But the following messages:

Indicate that it did not shutdown, even with kill 9. Which is hard to understand. The kernel should literally yank everything out from under it with a kill 9, unless something is going wrong at the kernel level. Are you running the latest kernel available for your distro?

I think you’ve already said so earlier, but can you confirm that a reboot brings dovecot back up and it behaves normally for some time? If so, that makes me think there may be a hardware or kernel problem. This is a weird set of symptoms for that, though; normally I would expect many services to act weird.

But, honestly, we’re just rehashing the same errors here. These are the same basic log messages we’ve been seeing all along. Dovecot is told to shutdown and restart; it tries but fails to shutdown and thus can’t restart because the old process is still hanging on to sockets, etc.

I’ve never seen this behavior from Dovecot, so I don’t have any hidden wisdom here. Maybe check dmesg for kernel errors indicating memory or disk faults. I’m guessing wildly because I’d like to help, but I just don’t see any reason for it to be failing seemingly at random.

hi Joe,

:~#uname -r

4.9.0-12-amd64

I havent a clue how i am supposed to tell whether or not this is the latest release for debian 9. When i go to the website, is says latest is 9.12. It doesnt even line up with what i have???

i have always just run any upudates that come through the virtualmin interface. Have i missed something?

:~# dmesg | grep Linux
[ 0.000000] Linux version 4.9.0-12-amd64 (debian-kernel@lists.debian.org) (gc c version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.210-1 (202 0-01-20)
[ 0.700593] Linux agpgart interface v0.103
[ 0.790666] usb usb1: Manufacturer: Linux 4.9.0-12-amd64 uhci_hcd

This is the strangest thing,
in putty, i am unable to run systemctl force-reload dovecot with any success. It produces the failed state error.

~# systemctl status dovecot
● dovecot.service - Dovecot IMAP/POP3 email server
Loaded: loaded (/lib/systemd/system/dovecot.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2020-06-02 05:29:39 AEST; 19min ago
Docs: man:dovecot(1)
http://wiki2.dovecot.org/
Process: 26345 ExecStop=/usr/bin/doveadm stop (code=exited, status=75)
Process: 12700 ExecStart=/usr/sbin/dovecot (code=exited, status=89)
Main PID: 719 (code=exited, status=0/SUCCESS)

And yet, after doing this and going into webmin>servers>dovecot server, i was able to start dovecot from there without any problems.

so now, virtualmin dashboard is saying that dovecot is running, even though the command shell is saying the opposite!

This does not make any sense!

force-reload is not a systemctl command as far as I know. It existed in the olden days for some initscripts, but I don’t think any systemd services would do anything with it. You can --force stop, and maybe --force restart (and maybe --force reload, but not force-reload).

Regardless, reload means to reload config files, it is not usually a service restart unless the way to reload config files is to restart (which is true for some services). reload sends a sighup, in the general case. I don’t know what force-reload did, as I’ve never used it, but I assume a restart what you want, no matter what. (Though I don’t recommend --force anything! A normal restart should always be sufficient, if it isn’t something is wrong somewhere, though I have no guesses why. You don’t want to leave Dovecot in an improperly shutdown state, needing manual cleanup, which --force would likely do.)

No matter what, though, force-reload is not doing what you want it to do. It’s probably doing nothing or spitting an error because its a command that doesn’t exist?

sorry joe, yes a typo on my part!

hey i just saw this in the latest mail error logs this morning…

Jun 2 05:32:57 host1 dovecot: master: Error: unlink(/var/run/dovecot/master.pid) failed: No such file or directory (in main.c:558)
Jun 2 05:34:23 host1 dovecot: master: Fatal: Dovecot is already running with PID 14483 (read from /var/run/dovecot/master.pid)

what does it mean by the above two lines exactly?