Dovecot Left in mixed failed state after LetsEncrypt gets a new cert

dumorian · June 12, 2020, 3:10pm

I’m sorry but I’m really frustrated at this point. These LetsEncrypt problems are killing my business.

These are CentOS 7 systems running the latest version of Virtualmin Pro. On one system, I could not login to get email. System Status showed Dovecot not running. I had manually renewed and set LetsEncrypt to auto renew a few domains last night. I didn’t catch Dovecot being down until this morning.

On the other system, again, after renewing LetsEncyrpt certs, System Status showed Dovecot was not running but in fact it sort of was. Each time I hit start Dovecot, it gave errors that certain processes were already running. I kept rm’ing those processes at which point it listed more. I just gave up and rebooted the system.

Why is LetsEncrypt writing paths to certs in dovecot.conf for every ssl domain? I run under a purchased EV cert which is recognized by all email client software. There is no reason for these to go into Dovecot that I can think of. Can I make it not do that?

Either way, something is wrong with LetsEncrypt on CentOS 7 systems running Virtualmin Pro. I have not had any issues with CentOS 6 systems running Cloudmin nor Virtualmin Pro.

Please help. These time bombs going off over and over are not acceptable.

Ilia · June 12, 2020, 4:35pm

Hi,

Sorry about that!

There are few bugs currently in relation to Dovecot and how SSL certificates are handled. We are aware and working on fixing it for upcoming, new Virtualmin release.

Thank you for the heads up!

Joe · June 12, 2020, 4:36pm

Please see this thread: Dovecot Failed state

I haven’t been able to reproduce this problem in my testing (and my servers that have the config discussed do not exhibit the problem), so I don’t know if that suggestion will help, but it’s all I’ve got. We’re definitely generating wrong config, but it seems to be harmless wrong config on my servers, but maybe in some cases it is not harmless. So, try it, and see what happens. Since it is cert related, this seems the most likely candidate.

adamjedgar · June 14, 2020, 7:30am

As this is possibly the same problem that i am having with Debian 9 Webmin/Virtualmin and Dovecot, i would be interested in knowing if the O.P here is still able to recieve emails?

On my system, even though it says Dovecot has entered a failed state, it still delivers emails.

Go into a hotmail,gmail,yahoo mail account and send yourself an email and see if it comes through.

Also, with my system, I can have Virtualmin saying dovecot is running fine, and yet if i go into command shell and type

systemctl status dovecot

i get:

dovecot loaded

dovecot active is a failed state error.

Irrespective of the above, my system is still delivering emails via imap and pop3 (SSL and START TLS)

orao · June 16, 2020, 5:33am

Hi, I also noticed same problem.

Last week I saw red warning in Virtualmin “server status tab” - dovecot is not running.
I immediately tested it with email client and everything was working fine. I manually restarted dovecot from console and status in virtualmin changed to OK.

2 days ago I updated system and after reboot everything was working fine, dovecot was up&running. Today 2 days after everything was fine dovecot was not running anymore. Service was not delivering emails. I started dovecot service manually from webmin and now mail delivery is working but:

systemctl status dovecot says it is dead

Do you think this is SSL renewal related problem?

edit:
I checked /var/log/syslog for that time and found out system also restarted NGINX server. I’m still trying to find if some letsEncrypt certificates were updated at that time I guess after certificate renewal services using this cert need to be restarted?

Joe · June 16, 2020, 3:51pm

Please do what I suggested in the thread I linked above, and tell me if Dovecot behaves more appropriately after a restart.

orao · June 16, 2020, 4:36pm

Thank you for reply!

I’ve just commented out all “ssl_ca = …” lines from dovecot.conf and restarted it. Process got a new PID.

I will also reboot server at night - will try to reproduce what I did last time that may caused error.

rskuipers · June 18, 2020, 10:34am

We’re experiencing the same issues. The certificate modified times correspond with the dovecot crashes to the second.

The way I determined this is by doing the following:

$ find /home -maxdepth 2 -type f -name "ssl.cert" -printf "%T@ %Tc %p\n" | sort -n
$ grep "master: Warning: Killed with signal 15" /var/log/syslog

I’ll be watching this topic for updates.

system · July 18, 2020, 10:34am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.