Dovecot config lines are being randomly deleted

I have had dovecot crashing frequently due to two causes.

One: I find that dovecot has crashed, and the error is a random missing curly bracket on one of the virtual server configs. I had not touched the config anytime this has happened. I thought it was happening when a letsencrypt cert was renewed, but it has also happened when there was no renewal. I edit the config where the ending curly bracket has randomly disappeared and put it back in, and the problem is fixed, until the next time it happens.

Two: I have not diagnosed. I find that dovecot has crashed, and that CPU is running at 100%. At that time there are several dovecot processes running. A reboot fixes this, and it hasn’t happened for a while, but I will need to diagnose this next time.

I had an interesting problem with Dove tonight. It started as an error when I opened Thunderbird informing me that the connection to a seldom-used mail account had been interrupted. I’d actually noticed the error a while ago and had made a mental note to fix it, but never got around to it. I almost never use the address in question.

While trying to diagnose the problem in TBird tonight, I noticed that it was attempting to use a certificate belonging to another domain on the server. So I assigned the problem mail domain its own cert, but that didn’t solve the problem.

I decided to restart Dovecot, and it spit out an error about a missing “}” at a certain line of /etc/dovecot/dovecot.conf. But what actually happened was that several lines of configuration from the other domain whose cert the problem domain was trying to use had been copied into the problem domain’s configuration, but without the “}”.

I don’t think it had anything to do with SNI because the mismatch occurred before I made that change. But it’s possible because I’d already enabled SNI for the domain whose certificate it was trying to use. So maybe.

In any case, I fixed that problem and checked the rest of /etc/dovecot/dovecot.conf, and found that there were several other instances of copying; but they didn’t trigger errors because the lines had been copied correctly. Apparently Dove is tolerant of superfluous lines of configuration as long as they’re syntactically correct.

Any problems with Dovecot always raise my eyebrow because in my experience, Dove is one of the better-behaved members of the team. I almost never have problems with Dovecot. So I tidied things up, restarted Dove, tested it, verified that it was working correctly, and backed up the working /etc/dovecot/dovecot.conf for a quick fix should it happen again.

I really don’t know why it happened other than I probably caused it somehow. I’ll monitor it for a while to see if it recurs.

Richard

The specific error was:

doveconf: Fatal: Error in configuration file /etc/dovecot/dovecot.conf: Missing '}' (section started at /etc/dovecot/dovecot.conf:510)

Just in case anyone else is looking at this.

Richard

My errors have been similar. The result of the lack of a trailing bracket, makes it look like two records were smushed together.

Just had the same problem on 2 vps; one is running ubuntu 16.04 and the other one 18.04.

I’m going to take another look at it this morning to try to find some pattern to the errors. I know there were (and still are) a few cases where the entire section for a domain was copied verbatim, but correctly, and the domain’s mail therefore worked properly.

There were other sections where a cert or key path was pointing to the cert or key of the wrong domain, but still worked. That was puzzling, but I think I may know why now, strictly in my own case.

It occurred to me this morning that the certs and keys that were being used in the instances in which the “wrong” domain’s cert was being used belonged to a domain that also used to be the domain and TLD of a server’s hostname, which said server hosted the sites that were using the wrong cert.

Or more simply stated, I used to have a server “server1.example.tld” that I have since decommissioned, but I still use the domain “example.tld” as a regular domain on a virtual server. That server used to host the sites that had entries to use its certs in /etc/dovecot/dovecot.conf.

I decommissioned that server more than a decade ago and migrated the domain and its mail into another cPanel server. I then decommissioned that server and migrated all of its accounts into the current Virtualmin server a little over a year ago.

Long story short, those particular errors may have been obsolete entries that date back more than 10 years, to another server, and that relate strictly to my own case. If that’s true, then they should be ignored in terms of troubleshooting the current problem, leaving the missing “}” as the only error in this instance.

Richard

Just to confirm, the only domains with superfluous entries were those that were migrated from cPanel, so I’m pretty sure that behavior is specific to my case and should not be considered part of the current issue.

However, /etc/dovecot/dovecot.conf did change all by itself overnight, the error again being the disappearance of the “}” between the end of one domain’s SSL entries and the beginning of the next. This time, both domains in question were created on this server (that is, they weren’t migrated).

I confirmed the disappearing “}” by comparing the backup I made last night to the current file.

Dovecot didn’t stop, however. Presumably it would have failed to start had I attempted restarting it before I corrected that error.

I also applied SNI to another domain’s mail. That worked properly and did not cause any errors. That and the timing of the error overnight suggest that SNI has nothing to do with the problem.

So basically, in my own case, the disappearing “}” happened overnight for reasons unknown. I have little experience troubleshooting Dove because in all the years I’ve been using it, this is only the second time I’ve had reason to.

Richard

I can confirm this issue on my Debian 9 after latest updates

CentOS 7.8 here.

Richard

Does Dovecot have any scheduled housekeeping that it does on a regular basis? I’ve been monitoring yesterday and today and there have been no more changes.

Richard

I assume it’s when Let’s Encrypt certs renew. This is a known bug, and I’m pretty sure Jamie’s checked in a fix already. A new release is schedule for the next day or so, as far as I know.

Ah, okay. That would make sense. Thank you, Joe.

Richard

1 Like

Cool - thanks Joe!

For anyone who is seeing this, can you post your full Dovecot config file? I’m pretty sure this is fixed in checked-in code that will be released in 6.11, but I want to be sure.

Long story short, those particular errors may have been obsolete entries that date back more than 10 years, to another server, and that relate strictly to my own case. If that’s true, then they should be ignored in terms of troubleshooting the current problem, leaving the missing “}” as the only error in this instance.

Had this very problem over the weekend. Dovecot just shut down. Easy to fix error adding a bracket.

It did seem to happen at the same time Lets Encrypt tried renewing the cert for the domain.

I got a notice for two more lets encrypt renewals. I went to the dovecot.conf, and sure enough, two end brackets were missing. I only have 2 of several domains using email, so if one of the virtual domains doesn’t use it, it doesn’t crash the server (trying to figure out why the server only crashes sometimes).

I’m sure the upgrade will fix this.

Any patch for now? I have this issue every day.

Try this https://github.com/webmin/webmin/commit/2ee43178df0cefe4e72398c521e2a908eb38011c
I had this problem also with a Debian 9 Server. But until now not with Debian 10.

Yeah, a LE SSL update to one domain this morning hosed the hell out of my dovecot.conf.

It was worse than the missing bracket this time. The entries were split into separate sets of lines from the local_name line, so local_name had no certs associated with it, and the cert entries were orphans with no local_name. Plus missing brackets. Fixable, but what a mess.

Richard