Impossible to send mail, cannot obtain certificate when requested... where to start to debug that O_o

Hello!

I’m sorry to come asking for help, but I find myself at my wit’s end. Could I humbly ask for suggestions, perhaps?

the server information

I have a Debian 10 dedi, running Virtualmin free, latest stable as provided by the automatic updates, behind cloudflare, relying on letsencrypt for its SSL part, serving quite a few websites for friends, for myself, and more. So far so good, no issues in the last months, websites, email, shell, all working properly.

the bug’s details

I have a brand new bug on my dedi, it’s been a few hours now, and yet things were working just yesterday: while receiving emails still works (both in pop3 and imap), I cannot SEND them anymore.

My error messages are in French, my apologies if it’s not the exact terms you’d find in English, I hope you’ll cross a part of the bridge and guess what they ought to have been when I get them wrong.

Thunderbird returns the error message that it cannot operate a secure connection with its peer, stating the requested domain name doesn’t match the certificate on the server, hence the configuration for mail.domain.tld must be corrected.

Then follows another dialogue, in which TB offers to add an exception to the rules, in which there’s also the button to manually request, again, the certificate.
And, something certainly odd in my eyes, when I click that button… it seems to fail to request the certificate!
The part that said “bad site” and explained the certificate doesn’t match the site, is replaced, now, with a new text appearing, saying “no available information”, “impossible to obtain that site’s identification information”.

I’ve confirmed that with two different domains on that server, I think all domains are concerned at this point.

what I attempted to resolve the issue

I’ve tried those operations:
(1) service restart postfix,
(2) service restart dovecot (I know it’s a postfix thing, not dovecot, but why not at this point, right?),
(3) I successfully requested a brand new letsencrypt certificate for the domains in which I confirmed the issue,
(4) I made sure that in virtualmin’s dashboard there was no status icon in the red… nothing.
(5) I found no relevant .conf file for postfix but at least I made sure dovecot’s conf file was unchanged.
(6) I also tried to check /var/log/err.log - nothing and /var/log/mail.log
But for mail.log unfortunately, let’s blame bots trying stuff I suppose, that log file isn’t understandable to me as it is, many lines per second are added, too many for me to keep track. Even if I use tail -f and quickly go back to thunderbird to hit Sent and Obtain certificate, it’s not fast enough, either no trace is recorded in mail.log, or it’s lost in the rest. Searching my own IP adress in there only returns the successful dovecot logins, not postfix failures (which shouln’t be much of a surprise, postfix comes from the server itself now that I think of it).
(7) Restart fail2ban on the system (which purges its list of banned IPs), in case a crucial IP address had been added to the block list by accident, and run again the previous tests
(8) as the websites are behind cloudflare, I tripled-checked, no change was made to Cloudflare’s configuration regarding the dedi in the last weeks; their support person, when contacted, also confirmed no change was made on their side, and they don’t see network problems on their own network’s monitoring.
(9) the last thing I could test myself, without inspiration (I googled quite a bit, but found nothing that I think is relevant here) would be to reboot the server, but I don’t think it’s quite wise, what if there is an actual issue that will become worse after a reboot, if some software bricks can’t work anymore with others, etc…

Would you have an idea of what may cause the problem, or in what direction I could investigate?
Maybe another error log with recognizable text patterns to search for?

My apologies if that’s a newbie question… that’s what I am :smiley:

Thank you very much if you can help, and merry holidays guys!

First thing I’d try is sending mail from Usermin or any other webmail that’s installed. If a webmail app can send don’t change anything in Postfix, not yet.

Second, assuming you’re the only Thunderbird user with problems, clear Thunderbird of all accounts and start its setup from scratch using just one account until it’s working. With any luck Thunderbird is just playing dumb and was caching previous bad certificates. While trouble shooting I suggest reducing the number of cached connections to 1 or 2 in Thunderbird’s advanced settings, especially if you’ve seen error messages to that effect.

If the problem is Postfix it’s more than likely because the LE certificates weren’t copied and installed into the Postifx configuration. This is a common issue with both Postfix and Dovecot that’s been covered a lot here in the forums.

Aside from that, it may be too early to count out Cloudflare but check the components closest to home first.

Thanks Ramin.

I didn’t think of those 2 things, that I should have tried before asking in the forum: testing in usermin, and with a new thunderbird profile.

The results came as expected: it does work, in usermin, and it does not work, with a new TB profile. Exactly the same issues if I insist long enough.

Starting from scratch, the connection wizard in TB came with an interesting result: incoming IMAP all okay, “main.domain.tld”, ssl found… but outbound mail, SMTP, this time with “domain.tld” : SSL not found.
Full of hope, I corrected it to “mail.domain.tld”, and bam, again, SSL not found.

Eventually, I found a solution that didn’t actually address the original problem, but circumvented it: instead of mail.domain.tld, for the SMTP server, I used the dedicated server’s reverse IP.
Thunderbird autodiscovered the encryption settings to use, with 993 + SSL/TLS inbound, 587 with STARTTLS outbound.
And this time, with that, it worked. YISS.

It’s relatively infuriating, as last year I had to replace the reverse with mail.domain.tld to make it work, really makes me question what the hell happens in my server’s innards. But, hey, as long as it works, I’ve got no issues, it’s great I can finally call it a day :slight_smile:

So: case closed, fortunately.

I’ll end it with a MERRY NEW YEAR 2021 FELLOW VIRTUALMIN USERS, MAY THAT YEAR BE GENTLE WITH YOU GUYS! :slight_smile:

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.