Dovecot 2.1.7 service down

Hi guys
I have been dealing with this issue for a couple of days, I have this VPS installed with Debian 7 and Virtualmin that worked like a swiss clock since a couple of days ago when the service Dovecot started going down.

Oct 30 19:07:37 vps58163 postfix/smtpd[9538]: connect from host195-246-dynamic.16-87-r.retail.telecomitalia.it[87.16.246.195]
Oct 30 19:07:37 vps58163 postfix/smtpd[9542]: connect from host195-246-dynamic.16-87-r.retail.telecomitalia.it[87.16.246.195]
Oct 30 19:07:37 vps58163 postfix/smtpd[9339]: connect from host195-246-dynamic.16-87-r.retail.telecomitalia.it[87.16.246.195]
Oct 30 19:07:37 vps58163 postfix/smtpd[9544]: connect from host195-246-dynamic.16-87-r.retail.telecomitalia.it[87.16.246.195]
Oct 30 19:07:37 vps58163 postfix/smtpd[9546]: connect from host195-246-dynamic.16-87-r.retail.telecomitalia.it[87.16.246.195]
Oct 30 19:07:37 vps58163 postfix/smtpd[9327]: connect from host195-246-dynamic.16-87-r.retail.telecomitalia.it[87.16.246.195]
Oct 30 19:07:37 vps58163 dovecot: imap-login: Error: auth: connect(login) failed: Cannot allocate memory
Oct 30 19:07:37 vps58163 dovecot: imap-login: Error: auth: connect(login) failed: Cannot allocate memory
Oct 30 19:07:37 vps58163 dovecot: imap-login: Error: connect(ssl-params) failed: Cannot allocate memory
Oct 30 19:07:37 vps58163 dovecot: imap-login: Error: socketpair() failed: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9538]: warning: connect to private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9538]: warning: problem talking to server private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9542]: warning: connect to private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9542]: warning: problem talking to server private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9339]: warning: connect to private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9339]: warning: problem talking to server private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9544]: warning: SASL authentication failure: cannot connect to saslauthd server: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9544]: warning: SASL authentication failure: Password verification failed
Oct 30 19:07:37 vps58163 postfix/smtpd[9544]: warning: host195-246-dynamic.16-87-r.retail.telecomitalia.it[87.16.246.195]: SASL PLAIN authentication failed: generic failure
Oct 30 19:07:37 vps58163 postfix/smtpd[9327]: warning: connect to private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9327]: warning: problem talking to server private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 dovecot: imap-login: Fatal: Error reading configuration: net_connect_unix(/var/run/dovecot/config) failed: Cannot allocate memory
Oct 30 19:07:37 vps58163 dovecot: master: Error: service(imap-login): command startup failed, throttling for 2 secs
Oct 30 19:07:37 vps58163 postfix/smtpd[9544]: disconnect from host195-246-dynamic.16-87-r.retail.telecomitalia.it[87.16.246.195]
Oct 30 19:07:37 vps58163 postfix/smtpd[9544]: connect from host195-246-dynamic.16-87-r.retail.telecomitalia.it[87.16.246.195]
Oct 30 19:07:37 vps58163 dovecot: imap-login: Error: net_connect_unix(imap) failed: Cannot allocate memory
Oct 30 19:07:37 vps58163 dovecot: imap-login: Internal login failure (pid=9557 id=1) (internal failure, 1 succesful auths): user=<info.isanmarinoapp>, method=PLAIN, rip=87.16.246.195, lip=92.222.2.87, TLS, session=<0StnwacGSgBXEPbD>

Here is some log’s lines from /var/log/mail.log.
As soon those warnings appear my users cannot log into their email account anymore until I restart the Dovecot service.
This happens 3/4 times a day and it’s a very pain in the ass, I tried to search everywhare without any luck and I really cannot understand what causes this issue.
The message “Cannot allocate memory” it’s very odd because I still have a lot of free resource on this machine.

Here is what free -m prints:

             total       used       free     shared    buffers     cached
Mem:          4096       1382       2713          0          0        484
-/+ buffers/cache:        897       3198
Swap:          128          0        128

Please help me solve this issue because I have two badass clients that are very pretty upset about this situation.

Regards
Matteo

Howdy,

Hmm, are you by chance using OpenVZ? If so, could you paste in the contents of your /proc/user_beancounters file?

-Eric

I think you need more memory on your system:

Oct 30 19:07:37 vps58163 postfix/smtpd[9327]: warning: connect to private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 postfix/smtpd[9327]: warning: problem talking to server private/tlsmgr: Cannot allocate memory
Oct 30 19:07:37 vps58163 dovecot: imap-login: Fatal: Error reading configuration: net_connect_unix(/var/run/dovecot/config) failed: Cannot allocate memory

Also post the content of the file Eric asked about:

cat /proc/user_beancounters

-Dustin

Eric actually I don’t know if I use OpenVZ, I just installed VirtualMin.
Dustin I don’t know of what kind of memory the warning is talking about because I still have a lot of free space on the hard disk and more than two third of free ram (4gig total).

Here is the content of user_beancounters:

Version: 2.5
       uid  resource                     held              maxheld              barrier                limit              failcnt
    58163:  kmemsize                 64192166            114987008  9223372036854775807  9223372036854775807                    0
            lockedpages                     0                 3157              1048576              1048576                    0
            privvmpages                474622               825625  9223372036854775807  9223372036854775807                    0
            shmpages                     1107                 1123  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            numproc                       131                  261  9223372036854775807  9223372036854775807                    0
            physpages                  411428               789984                    0              1048576                    0
            vmguarpages                     0                    0              1081344  9223372036854775807                    0
            oomguarpages               245790               545716              1048576  9223372036854775807                    0
            numtcpsock                     44                  151  9223372036854775807  9223372036854775807                    0
            numflock                      191                  202  9223372036854775807  9223372036854775807                    0
            numpty                          1                   12  9223372036854775807  9223372036854775807                    0
            numsiginfo                      0                   27  9223372036854775807  9223372036854775807                    0
            tcpsndbuf                  804816              4199800  9223372036854775807  9223372036854775807                    0
            tcprcvbuf                  720896              2473984  9223372036854775807  9223372036854775807                    0
            othersockbuf               361432              1125344  9223372036854775807  9223372036854775807                    0
            dgramrcvbuf                     0                91560  9223372036854775807  9223372036854775807                    0
            numothersock                  221                  500                  500                  500                  106
            dcachesize               28585608             63972777  9223372036854775807  9223372036854775807                    0
            numfile                      3103                 5675  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            numiptent                     127                  127  9223372036854775807  9223372036854775807                    0

-Mat

That file that Eric asked for, at least my understanding, indicates you are OpenVZ.

I missed you free memory output, sorry about that (I’m visually impaired). So I do agree it is strange. How big is your user base?

-Dustin

That particular file does indicate that you’re using OpenVZ.

Everything listed in there is related to your OpenVZ resources.

We’ve seen quite a few unusual resource related issues when using OpenVZ… even if it appears that there is plenty of RAM, if it’s burst RAM that’s being used at the time, that RAM can be taken away to give to other users on the server (that is, burst RAM isn’t guaranteed).

It doesn’t appear RAM related in your case though.

Note the “failcnt” field on the far right of the output you’ve shared above… any failure listed there indicates something that asked for resources and was denied.

In your case, there are over 100 “numothersock” failures.

That parameter refers to the maximum number of non-TCP sockets (local sockets, UDP, and other types of sockets).

Looking at the output you initially shared, it looks like the “numothersock” failures correspond with the errors in your output.

What you may need to do is ask your provider for more resources. More resources altogether is good, but in particular you may want to ask them to increase the “numothersock” value.

Note though that this sort of resource issue is unique to OpenVZ, Xen, KVM, and VMware-based providers don’t use the same method for resource restriction. With other VPS types, it’s just about ensuring that you have enough RAM.

-Eric

Thank you so much Eric for your answer, so I guess the best way to avoid this kind of issue should be, instead of using a VPS, buy a dedicated server.

I noticed another thing: I logged in into the virtualmin mail interface with root account and there were about 68000 (not kidding) emails from “Cron Deamon” (issue similar to this thread http://www.virtualmin.com/node/30990 ) so as suggested in that thread I deleted all the email and disabled this cron job

 -x /usr/share/awstats/tools/update.sh ] && /usr/share/awstats/tools/update.sh 

I don’t know if my issue was reletad to this but since then I didn’t have any other user email connection problems.

Is there any explanation to this?

Howdy,

Oh, we love VPS’s. There’s no reason at all to buy a dedicated server if you don’t need one.

I might just recommend a VPS that’s not OpenVZ-based :slight_smile:

Regarding the email in your queue – it’s possible that those emails were contributing to the resource limit that was being hit.

If it’s working now, that’s fantastic!

-Eric

I work with OpenVZ almost every day …

The setting your provider is using for numothersock is way too low! Dovecot uses lots of sockets to do its thing, and 500 is way too low, I have 1982 used on a VPS with about 5 users on it. Sockets are an integral part of a Linux system and a limit like that would cause all kinds of chaos!

It should be set to “9223372036854775807” the same as the other counters.

I am not aware of AWstats using any sockets, but I don’t really deal with it much.

So I should contact my provider and ask him to increase the nomothersock value right?

Yes. Once that is increased it should resolve your issue. (From my understanding of what everyone else in this thread is saying at least)

Good luck, let us know if they do increase it for you. I’ve seen a few providers that wouldn’t.

-Dusitn