Dovecot keeps stopping

kyle787 · July 1, 2011, 4:12am

Background:
I just got a fresh install of ubuntu 10.04 on a new vps and installed Virtualmin. I have Virtualmin running flawlessly on another VPS from a different provided.

Problem:
On the system information page and the service tab I keep noticing that dovecot has stopped and when I start it via virtualmin it quickly stops again. When I try to start it via SSH, I get this message. “Last died with error (see error log for more information): Time just moved backwards by 299 seconds. This might cause a lot of problems, so I’ll just kill myself now. http://wiki.dovecot.org/TimeMovedBackwards
If you have trouble with authentication failures,
enable auth_debug setting. See http://wiki.dovecot.org/WhyDoesItNotWork
This message goes away after the first successful login.”
I saw a little information in the forums about people with similar issues but I can’t figure out how to fix it. What I understand is happening is ntpdate keeps rolling back or forward the time because the server has the wrong time, is that right?

Eric · July 1, 2011, 4:24am

Hmm, so are you running ntpdate? That can happen if you’re running that, but not running it frequently enough.

Using the NTP daemon rather than ntpdate can help, as it makes more gradual changes. You could also run ntpdate more frequently, perhaps once an hour or more, depending on how much your clock drifts.

-Eric

kyle787 · July 1, 2011, 6:03am

Hey, servers aren’t really my thing. I am a front end developer, I am running everything as the virtualmin gpl package was installed. I am rather sure it is ntpdate though, how can I check to know for sure? I think dovecot recommended using NTP to fix it, how do I install it and set it up correctly to work with virtualmin?

Thanks so much.

Eric · July 1, 2011, 2:14pm

Hmm, do you know who setup your server?

The ntpdate program isn’t something that would be setup by default on a typical Linux distribution… that would normally be setup manually, and called by a cron job.

You could try going into Webmin -> System -> Scheduled Cron Jobs, and try searching for “ntpdate”, and see if it finds anything there.

-Eric

kyle787 · July 1, 2011, 2:43pm

I just got the VPS and then reloaded ubuntu onto the server and set up virtualmin. How can I set up ntpdate. Why didn’t I have to do this with my other server?

tpjthomson · July 1, 2011, 2:51pm

Hi

This page might be of help.

https://help.ubuntu.com/10.04/serverguide/C/NTP.html

Eric · July 1, 2011, 3:01pm

It’s possible that your VPS provider had automatically set it up ntpdate for you, and that’s what is causing the problem. You may want to look in the existing Cron jobs mentioned above to see if there’s an existing entry for that.

Either that, or your system clock is so inaccurate that it’s causing Dovecot to get confused

If you can’t find any existing ntpdate setup, you could try the link provided by tpjthomson above to get that setup.

-Eric

kyle787 · July 1, 2011, 5:53pm

I checked for cron jobs and didn’t see anything.

I don’t know if the following will be helpful it leads me to believe something is wrong…

root@server2:~# ntpdate
1 Jul 17:50:12 ntpdate[]: no servers can be used, exiting
root@server2:~# ntpdate pool.ntp.org
1 Jul 17:50:30 ntpdate[]: step-systime: Operation not permitted

I have set the iptables rule If protocol is UDP and destination port is 123, but it still didn’t help. I tried messing around with NTP and the guide provided by tpjthomson but it didn’t fix it.

With ntp when ever I tried tried to get the peers it would say No association id’s returned, and I added 4 new ones in the ntp configuration file.

tpjthomson · July 1, 2011, 7:18pm

Maybe you need to change this setting to OFF or Detect Automatically?

Webmin>Hardware>System Time>Module Config>System Configuration>System supports hardware time

I had trouble getting the System Time/Sync working AT ALL until I had done that on my VPS, which affected other stuff on the box. It was set to yes after I chose my OS and the provider deployed the base install.

Toby

kyle787 · July 1, 2011, 11:28pm

Toby, the even weirder thing is under hardware the only thing I have is printer administration.

tpjthomson · July 1, 2011, 11:40pm

If it is a recent install and you haven’t done much to it, I think if I were in your shoes I’d consider re-installing from scratch as it sounds like it may be a dodgy install if there ought to be at least the basics:

Hardware
GRUB Boot Loader
Logical Volume Management
Partitions on Local Disks
Printer Administration
System Time

I don’t use LVM, but apparently it’s needed by the weird VPS disk layout

kyle787 · July 1, 2011, 11:54pm

Yeah I have reloaded it before, I will try again later tonight. However my other working VPS doesn’t have time settings under hardware either.

kyle787 · July 3, 2011, 4:41am

Reloading it worked! Thanks all! I have another problem though… if I should open another topic let me know.

Here is a quick brief. I have one domain lets call it a.com. I started of with server.a.com, ns1.a.com, and ns2.a.com, all point to one IP. I recently got server2.a.com, ns3.a.com, and ns4.a.com. Right now, server.a.com is set up correctly or so I think, in Virtualmin I have created A records for all of the ns and servers. However I recently added b.com to server2.a.com. The site b.com points to ns3.a.com and ns4.com and is hosted on server2.a.com. Server2.a.com and ns3.a.com have the same IP addresses, and ns4.a.com has a different one, differing by the last number. All of these are set as A records for a.com on server.a.com. When I use http://www.squish.net/dnscheck/ it says “100.0% of queries will end in failure at IP (ns3.a.com) - returned REFUSED code” I don’t understand why this is happening. If I run a.com which is hosted on server.a.com everything checks out fine. Any ideas?

Eric · July 3, 2011, 4:50am

Is BIND listening on your new IP address?

Try running “netstat -an | grep :53”, and look at the UDP listings – is your new IP address included there?

If not, you may need to add it. You can do that in Webmin -> Servers -> BIND DNS Server -> Addresses and Topology, and make sure that new IP address is listed under the Addresses section of “Ports and addresses”.

-Eric

kyle787 · July 3, 2011, 4:52am

Which server should I check on?

Eric · July 3, 2011, 4:56am

Which server should I check on?

Well, the error you got said that it was “ns3.a.com” that was failing – so you’d need to make sure BIND is running on whatever server hosts that IP address, and that it’s correctly listening for incoming connections.

-Eric

kyle787 · July 3, 2011, 5:08am

Okay when I do “netstat -an | grep :53” I have localhost show up and the IP of the ns1, ns2, and the server, but nothing for ns3, ns4, or server2. This shows up:

root@server:~# netstat -an | grep :53
tcp 0 0 xxx.xx.xxx.34:53 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN
tcp6 0 0 :::53 :::* LISTEN
udp 0 0 xxx.xx.xxx.34:53 0.0.0.0:*
udp 0 0 127.0.0.1:53 0.0.0.0:*
udp6 0 0 :::53 :::*

So I added the IP of ns3 to it, and restarted BIND, however now when I do “netstat -an | grep :53” this shows up:

root@server:~# netstat -an | grep :53

tcp6 0 0 :::53 :::* LISTEN

udp6 0 0 :::53 :::*

kyle787 · July 3, 2011, 5:23am

Alright, I am not sure what I did or if I did anything it now reads, “100.0% of queries will end in failure at xxx.xx.xxx.34 (ns3.a.com) - query timed out”

kyle787 · July 3, 2011, 5:28am

Also the IP for ns1, ns2, and server is xxx.xx.xxx.34 and the IP for ns3 and server is xxx.xx.xxx.148 and ns4 is xxx.xx.xxx.149. So why is it saying that the IP for ns3 is xxx.xx.xxx.34?

Eric · July 3, 2011, 2:46pm

Well, I’m starting to have a hard time following with the masked domain names and IP addresses

However, it sounds like this is important:

Also the IP for ns1, ns2, and server is xxx.xx.xxx.34 and the IP for ns3 and server is xxx.xx.xxx.148 and ns4 is xxx.xx.xxx.149. So why is it saying that the IP for ns3 is xxx.xx.xxx.34?

I’m not sure why that would be, but that’s something you’re going to need to figure out

Either your domain name registrar or the DNS on your server has that incorrect. You’d probably want to check both and make sure both have the IP address correct for that particular domain name.

But this also sounds important:

Okay when I do “netstat -an | grep :53” I have localhost show up and the IP of the ns1, ns2, and the server, but nothing for ns3, ns4, or server2.

If you ran that command on “server1”, and if server1 hosts ns1 and ns2 and not ns3 and ns4, than that’s expected… the above command shows connections on it, not connections for other servers. You’d need to run that on server2.

Again though, I’m getting a bit confused, so I’m not sure I fully understand your setup

That said – I think the key lies in you verifying that the IP addresses for all your NS records are correct on both your server and at your registrar, as well as verifying that BIND is listening on each of those IP addresses on each server.

-Eric