Dovecot Failed state

Again, why reinvent the wheel?
Monit ftw.

Because this temporarily serves its purpose and is not meant to last. It was also quicker to write.

Today I found this in the logwatch report of a server with Dovecot apparently down according to the Webmin dashboard. The file /var/run/dovecot/config seems to exist but is zero length, symlinked from /run/dovecot/config, owned by root, chmod 0600.

--------------------- Dovecot Begin ------------------------

Dovecot was killed, and not restarted afterwards.

Dovecot disconnects: 6 Total

Unmatched Entries
dovecot: anvil: Fatal: Error reading configuration: read(/var/run/dovecot/config) failed: read(size=8192) failed: Connection reset by peer: 1 Time(s)
dovecot: master: Dovecot v2.2.33.2 (d6601f4ec) starting up for imap, pop3, pop3 (core dumps disabled): 1 Time(s)
dovecot: master: Error: unlink(/var/run/dovecot/master.pid) failed: No such file or directory (in main.c:518): 1 Time(s)
dovecot: ssl-params: Fatal: Error reading configuration: read(/var/run/dovecot/config) failed: read(size=8192) failed: Connection reset by peer: 1 Time(s)

---------------------- Dovecot End -------------------------

Hi,
I just wanted to comment that I am having the same problem with dovecot that Orao has. I have the same log errors and dovecot always dies when a Let’s Encrypt certificate is renewed. Although sometimes it happens without a Let’s Encrypt renewal.
I tried without ssl_ca lines and it doesn’t make any difference.
If I’m lucky enough, the /usr/bin/dovecot process survives and the email continues to work.

I’m not sure, but I think it started when I upgraded CentOS from version 7.7 to 7.8

You know about the only time i ever see a master pid error is when one of my clients email accounts is full.
So what i am saying is not about the email account, its about the cause…full.
I wonder if that file you mention being “zero length” is related to the problem?

Howdy,
my server has been having this issue for over a year and a half. Never thought to report it…my bad.
Every LetsEncrypt Cert update kills Dovecot, but not all processes running under Dovecot. This is the reason that it says running but the Vmin console properly reports that Dovecot is down, because it is.
Running a script that kills all, ALL Dovecot processes and then restarting Dovecot allows things to go back to normal.
My opinion(only that) is the script that Vmin uses after updating certs is broken or incorrect.
FYI, CentOS 7, fully updated, used Vmin script to install from bare OS about 3 years ago

Bug could be reproduced if we could manually run the same script as the “automatic letsencrypt renewal” does in the background. When certificates are manually renewed from control panel it doesn’t affect dovecot.

If we could run this script manually It would be easier to do more debugging. I was already looking into /usr/share/webmin/dovecot/dovecot-lib.pl and some extra logging could help to find at which point things go wrong.

Ubuntu 18.04.4
webmin: 1.942
virtualmin: 6.09
dovecot: 2.2.33.2 (d6601f4ec)

That script is whatever your OS provides (a systemd unit file, generally), and it is likely different across distributions. I’m reasonably confident it is not that.

But, I want to be clear that we don’t have a custom dovecot service/unit file here. We aren’t taking over the Dovecot installation and replacing pieces. We’re just modifying the config files and using the OS-provided systemctl reload|restart dovecot or whatever. I suspect the problem is our misuse of ssl_ca directives, but we won’t know until someone actually tests that theory (I don’t have this problem on my servers for unknown reasons, so I can’t test) or until the new version goes out and it’s changed for everyone. (But, y’all could test this theory by making the change and trying both restart and reload to see what happens.)

I got a debian 9 server with a Dovecot in failed state but email still working:

root@green ~ # service dovecot status
● dovecot.service - Dovecot IMAP/POP3 email server
   Loaded: loaded (/lib/systemd/system/dovecot.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2020-06-18 22:13:24 CEST; 1 weeks 1 days ago
     Docs: man:dovecot(1)
           http://wiki2.dovecot.org/
  Process: 17179 ExecStop=/usr/bin/doveadm stop (code=exited, status=75)
  Process: 1099 ExecStart=/usr/sbin/dovecot (code=exited, status=0/SUCCESS)
 Main PID: 1146 (code=exited, status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
root@green ~ # systemctl reload dovecot
dovecot.service is not active, cannot reload.
root@green ~ # systemctl restart dovecot
Job for dovecot.service failed because the control process exited with error code.
See "systemctl status dovecot.service" and "journalctl -xe" for details.

reload and restart both fail. Is there anything I can do to help debug the problem? I am currently just ignoring it because the email keeps working.

Howdy,
@Joe from Virtualmin.
I understand Joe, it is problematic that this issue is not reproducable for the devs’.
So here is the commands I’ve been running when my Certs are updated:

for i in ps aux | grep dovecot | awk '{print $2}' ; do kill -9 $i ; done

systemctl restart dovecot.service

This will get Dovecot running again evertime. I don’t understand why this works, but it does.

Hope this helps

OK after reading some more above, I commented out all .ca lines in de Dovecot conf file and restarted dovecot. Hopefully this helps.

Hi,
I confirmed that dovecot enters failed state without “ssl_ca” lines 10 days ago - these lines are not causing problems.

How can the script/task that causes problems be executed manually? I could not find cronjob for it.

I’ve been dealing with this for the last 7 months, on 3 different servers. 1 is Ubuntu 18, others are both CentOS 7. System sends me an email when my CPU is above 40% for 3 minutes, and that’s how I know Dovecot is f*cked (It’s all the way at 99% when Dovecot is in failed state). Dovecot is definitely not playing nice with Webmin renewing LetsEncrypt.

Temp solution is to reboot. Thank god all my servers are on SSDs, rebooting is quick so I reboot every here and there.

Webmin team definitely need to look into the integration of Dovecot and LetsEncrypt :slight_smile:

JamieCameron found something last year. Maybe it is connected with our problem?

Submitted by JamieCameron on Mon, 12/02/2019 - 11:25

I found a bug that can cause Dovecot to not get restarted on cert renewal - I’ll fix it in the next Virtualmin release.

https://www.virtualmin.com/comment/820295#comment-820295

Maybe he can help?

Jamie is, obviously, working on it (Ilia, too). The next version fixes several issues in Dovecot cert handling. We still do not know if any of those issues are the cause of this specific problem (the issue being dovecot in failed state, but still maybe working).

Things that have been fixed:

  1. Misuse of ssl_ca
  2. Leftover extraneous ssl_ config directives when deleting domains
  3. Some other cert related issues, where config might not match reality

The lack of restart is obviously not the cause of this problem, as there has already been a release since that change was made (so the current version of Virtualmin is restarting Dovecot on cert renewals). And, in fact, it’s likely that update is what made this problem, whatever its cause, much more apparent since it would cause it to be triggered on every cert update.

Jamie and Ilia have been doing a huge amount of work on cert handling in the mail stack over the past several weeks. I’m hopeful the next release, and one or more of the Dovecot-related fixes, will resolve this issue. There have been a bunch of confounding factors that has made it hard to push out an update to fix this one issue (especially since we still don’t know the actual cause, we’re just assuming/hoping that one of the things that has been fixed will resolve it).

2 Likes

What i am wondering is if a system where clients using IMAP is more or less likely to experience this problem than a system setup with clients using POP3?

Something is niggling in my head suggesting its more likely with IMAP, but maybe im just imagining things!

No. Absolutely not.

Hello people, I see the latest update is available and SSL problems should be fixed now. Please test it and report if anything goes wrong. I will test it too.

Thank you virtualmin team!

I can confirm that some of my letsencrypt certs were automatically renewed in last few days and dovecot is still running with Active status :star_struck: :star_struck: :star_struck:

image

Update fixed issues.

Cheers :smiley:

1 Like

I still have those unclosed parentheses bug in dovecot config after any domain lest encrypt renewal in the latest version. :frowning: