LetsEncrypt Overwriting Root Server Keys

I ran updates yesterday so am on Webmin 1.942 and Virtualmin 6.09 Pro. Last night, one of my virtualserver domains requested a new cert from LetsEncrypt. Suddenly tls and ssl on email started throwing cert errors. After chasing for far too long, I found that both my cert file and key file under /etc/pki/tls/(to proper directories and filenames) had been over written with this new cert. So, instead of my mailserver answering to its domain cert, it was trying to use that client’s domain cert for all connections.

This is a CentOS 7 system running Postfix and Dovecot.

Any ideas on this?

I’d guess there was a “Copy to…” that caused some domain’s cert/key to be associated with whatever service uses that cert/key. Probably “Copy to Postfix”?

Copy to… tries to figure out what the primary certificate/key is for a given service, and replaces it with the cert for the domain you’re clicking it for. That association is permanent until “Copy to…” is clicked from somewhere else, and updated certs will be copied in the future.

I am seeing this issue too. It’s causing BIG problems!

There seems to be a new file generated called “ssl.everything”. I bet that’s the problem.

I have one virtual server that is meant to host the certificate for all services - Webmin, Usermin, ProFTPD, Postfix & Dovecot. Let’s call it “primary.mydomain.com”. Up until yesterday this has been working fine (but I have to say I have always found VM >> Server Config >> SSL Cert >> Lets Encrypt >> Copy to… to be unreliable and misleading from day one).

Currently - although, as I say, this setup has been working fine, only one service shows “Copy to” enabled. And that is Dovecot. And it is that service, and only that service, that is getting screwed every time Lets Encrypt renews some virtual server. It’s as if “Copy to” is now switched on globally, and each virtual server updates “primary.mydomain.com” with the wrong domain. That leads to countless email users seeing warnings of a foreign domain from their email clients and to all kinds of panic that they’ve been hacked

As a short term fix then - how can I switch OFF “Copy to” for “primary.mydomain.com”? I don’t see any option for that.

Update


I think I can see the source of the problem. In the case of a large number of virtual servers (but oddly, not all) if I navigate to “VM >> Server Config >> SSL Cert >> Lets Encrypt” I see:

“This SSL certificate is already being used by : Dovecot (host this-domain.com)”

Now I would expect that to only be the case for ONE of my virtual servers. How can Dovecot be using multiple certificates? But if this flag IS set for multiple servers, then each time one of these gets their certificate updated, the system will notice that that is Dovecot’s certificate, look up the path to that certificate, and replace that with this new certificate. In other words, in my example my primary host “primary.mydomain.com” has its certificate replaced with the new one for “this-domain.com”. Cue disaster.

So the immediate question becomes: How can I reset this flag for all these domains to prevent this happening? i.e. Instead of seeing
“This SSL certificate is already being used by : Dovecot (host this-domain.com)”, I want to see the button “Copy to Dovecot” again as it used to be.

1 Like

I have a virtual server created for the server host name. It is one line above the one that overwrote my cert. It is possible that I clicked the wrong one when working on this a while back.

I purchase a wild card cert to use on all my systems. Particularly for Postfix, Dovecot as I don’t know how reliable LetsEncrypt certs are with respect to email client recognition.

So, apparently at least one of those buttons follows a path I set to my purchased wild card cert used by multiple services. How do I find which one or ones is doing this or how should I fix this. It seems I have a ticking time bomb here.

FYI, I shell in via console to update my main server cert files. I also find the interface to be a bit confusing when one first uses the command line to obtain and install a cert.

This just happened again when another domain went through the automatic LetsEncrypt renewal. It overwrote my cert files in /etc/pki/tls/

I guess I’ll run through all the domains on this system and turn off auto updates for now. This is wreaking havoc with my clients’ email programs.

Thanks,
John

I have gone through all my servers and set LetsEncrypt to “Renew Manually” on all of my domains until something is figured out on this issue.

  1. I don’t know if this is an across the board issue.
  2. I don’t know where Virtualmin gets the path to my main server certs.

I guess I need to go back in and set the main “Contact Email” on each domain to myself so my customers don’t get the notices.

1 Like

yes, happening to me too.

simplest solution is to request certificate from Lets Encrypt again for your main mail server.

1 Like

Yes that works - but we have hundreds of email users and by the time we’ve caught the problem & fixed it, that’s an awful lot of unhappy peeps!

yes, same here, hopefully there is a fix for it coming ?

1 Like

I now have had this happen on another CentOS 7 system. I had a manual creation to do. It did not replace my main server keys but it did leave Dovecot’s config broken and Dovecot down. I deleted all the entries at the bottom of dovecot.conf which were added by LetsEncrypt. This allowed Dovecot to start.

1 Like

This has absolutely smashed my system. It took out more than just email…I have also had a client joomla website go down as well. For me it was two problems that just happened to coincide (one problem i have encounered before…(which is number 1 below)

  1. the webmin update over wrote Virtualmin>Services>PHP-FPM Configuration “group” name for one of my client domains…it removed the “.com” at the end of the domain causing php5.6-fpm to enter a failed state (so it read group = domain when it should have read group = domain.com)

  2. this SSL cert issue has really created a nightmare for me…Ive got complaints all over the place with clients saying their apps (such as outlook and apple mail) arent working

When i requested a new SSL cert for my main domain and server, it completely knocked webmin/virtualmin offline and i could not access my control panel. I had to restore the server from a backup. Totally screwed me this did.

Anyway, at least i know what is going on now.

I have had to roll back to only using webmin update 6.09. I cannot install 6.09-2 or it screws up my system.
I need to pick a quite time on my network before run the gauntlet again.

PHP-FPM is wholly unrelated to this conversation, please open a new topic (or maybe there’s already a topic for it).

I’ve been unable to reproduce the certificate overwrite issue, thus far, but I’ll talk to Jamie to see if he has any insight. It’d be really helpful for the discussion to be specific about which services are having problems. There are multiple independent key/certificate config locations (and it differs based on distro/version), so as long as we’re not specific about problem reports, it leaves a lot of variables undefined.

I see Dovecot mentioned by @dumorian, which is helpful; is the problem always with Dovecot (IMAP/POP3 client connections) or does it impact Postfix as well, or anything else? Dovecot is particularly complex because in some versions it supports multiple certificates on a single IP (Postfix does, too, in very new versions, but I don’t think we’re trying to handle that yet). A wildcard cert complexifies the situation, as well, and I think I want to narrow it down to one specific workflow that triggers the problem with a regular (non-wildcard) Let’s Encrypt certificate, assuming that is a case that happens. Me or Jamie needs to be able to see it happening to figure out why.

Joe,
it happened twice in two days on my server (you can have access if you wish)
as far as I see the domains autorenewed the lets Encrypt certificate and then overwrote the certs for the main domain on the server.
to get over it I twice ran the request Certificate for teh main domain.
Brian

Hi Joe,

I run a EV wildcard cert on all of my systems. I have Postfix, Dovecot, FTP and such set to use that cert which is stored in the default /etc/pki/tls directory structure for CentOS. All of my systems are fully updated except for ImageMagick which has a dependency issue last time I checked.

On one system, the first to have the problem, an update to clientdomain.com overwrote my EV cert and key in /etc/pki/tls. I thought maybe I was the problem as this domain was just one line down from myservername.myserverdomain.com. I though maybe I clicked on the wrong line.

Yesterday another cert renewed, it did the same thing. I’m certain I did not do anything aside from the initial creation of the cert.

Issue one: LetsEncrypt is overwriting my main cert files.

Last night, after renaming a wordpress.domain.com virtualserver, I ran LetsEncypt manually to get the cert for the new www.domain.com. I found out this morning that Dovecot was not running. When I tried to start it I got an error something to the effect that “ssl some key =” I didn’t see the particular line so I just deleted all the entries put into dovecot.conf by LetsEncrypt. I should have copied the error, but panic happens when email is not functioning. In this case my EV cert was not overwritten, but only the Dovecot problem.

Those are the three LetsEncypt updates which have occurred over the last couple of days.

Please be specific. For what services did you see this behavior? There are several: Dovecot (IMAP/POP, i.e. receiving mail), Postfix (SMTP, i.e. sending mail), ProFTPd (FTP), Webmin and Usermin.

They all use different key/cert locations and those locations can be different depending on distro/version. I’m trying to reproduce the problem, and just hearing that “it happened to me, too” doesn’t really add any information to help do that.

Hi Joe,

Did my last post help with details? There seems to be at least 2 issues shown there.

In my opinion the problem is this Joe:

Surely only one virtual server should “own” a particular service. For example, using the “Copy To” button for Dovecot on a specific VS, that should then display:

“This SSL certificate is already being used by : Dovecot (host this-domain.com)”

But since the most recent upgrade a significant number of virtual servers on my system display this message (and I know for a fact that the “Copy to” button was never used for them, so I have no idea why this is so).

So essentially they are all fighting for control of the one service. Therefore, when one gets a certificate update it replaces the Dovecot certificate and throws everything into chaos.

The Virtualmin control panel has no means for us users to correct this. There is no button to STOP a VS from having control of the Dovecot certificate. You might say - just go to some VS that does display the “Copy to” button. Hit that, and that should reset everything. But it doesn’t. In fact in my experience the “Copy to” button for Dovecot simply does not work. Does it for you Joe?

OK, well try this then to reset it: Go to the server config “enabled features” and disable and then enable “SSL web site”. But then you enter a new fresh hell with this:

“Adding new SSL virtual website … … certificate file is not valid : Line 32 does not look like PEM format”.

I have found the only option then is to completely delete then restore the VS. & guess what? When you do, there it is again:

“This SSL certificate is already being used by : Dovecot (host this-domain.com)” :sob:

Joe
sorry about lack of info.
I should have taken more notice, but it was at least the Dovecot and I think Postix certs that were overwritten by the domain that wasn’t the main domain for my server.