DNS problem

loyalwhite · December 10, 2010, 2:31pm

Hi all,

Until yesterday my server was running perfectly. Today I get a call from a colleague saying the site is down. Trying to surf to it, the browser returns the error "Could not connect: Unknown MySQL server host ‘my-domain.co.uk’

Looks like a DNS problem, I thought, if the server can’t resolve a domain that points to itself in order to connect to MySQL.

I checked my domain on Pingability, and got a succession of DNS errors, which imply that I have no DNS server running, even though VirtualMin assures me that BIND is running.

“Warning my-domain.co.uk does not have an IP Address (A) record.”
“Error None of this zone’s name servers responded on the request for ‘my-domain.co.uk’ records. Giving up.”
SOA record shows as Unknown
“Warning Did not find any IP Address (A) records for the name server ‘ns1.my-domain.co.uk’. Normally the parent name server will list them. These name server A records are also called ‘host records’ and are usually set by the domain name registrar.”
“Information No glue records found at parent name servers for my-domain.co.uk”

Until yesterday this site was running perfectly. Is this an issue at the domain registrar, not providing the glue records to point to my server? I am no expert on DNS - any help would be sincerely appreciated.

Eric · December 10, 2010, 3:15pm

Howdy,

You may want to review the nameservers being used for that particular domain, and make sure they’re correct (and haven’t changed for some reason).

Also, you could try running a DNS report using intodns.com, which may give you some additional insight into what’s going on.

-Eric

loyalwhite · December 10, 2010, 3:44pm

Hi Eric,

Here is a link to the intodns output. It finds NS records at the parent server, but it says there is no DNS server running at the IP addresses to which those records point. But according to VirtualMin, BIND is up and running. Any ideas?

http://www.intodns.com/waos-online.co.uk

Many thanks,

Adam

Eric · December 10, 2010, 3:49pm

Howdy,

First off – thanks for the DNS report, that does help in troubleshooting.

Doing a lookup at your nameservers – it does indeed appear that BIND is unavailable to the outside world.

Since it sounds like it’s running locally – you may want to verify that you don’t have a firewall blocking UDP port 53. It seems to hang, rather than reject immediately, which is often a sign of a firewall.

-Eric

loyalwhite · December 10, 2010, 4:06pm

Hi Eric,

The Linux Firewall in Webmin is enabled, but third on the list is:

Accept If protocol is UDP and destination port is domain

Can I assume that “domain” means port 53?

Eric · December 10, 2010, 4:09pm

Howdy,

Yup! That firewall output all looks normal.

Could there be a router or firewall in front of your server causing the trouble?

Also, you may want to restart BIND, just in case it’s running, but hung for some reason.

-Eric

loyalwhite · December 10, 2010, 4:24pm

No, it’s a dedicated server with a hosting provider. Nothing else in front of it.

I have restarted BIND serveral times from both VirtualMin and the command line, and indeed rebooted the whole machine, to no effect.

I am tearing my hair out!

Eric · December 10, 2010, 4:59pm

If you log into your server, and run “dig @localhost”, do you receive a list of a bunch of root nameservers? Or do you see an error of some sort?

Also, when you restart BIND, take a peek in your logs in /var/log – do any errors show up in there relating to BIND?

-Eric

loyalwhite · December 10, 2010, 5:14pm

Hi Eric,

Sincere thanks for your time on this. Here is the output from dig @localhost:

; <<>> DiG 9.3.6-P1-RedHat-9.3.6-4.P1.el5_4.2 <<>> @localhost
; (1 server found)
;; global options:  printcmd
;; connection timed out; no servers could be reached

I’m not quite where exactly within /var/log I should be looking - I don’t see a file or folder for BIND. Can you advise?

Eric · December 10, 2010, 5:18pm

Well, where exactly depends on your distro… but from your dig output above, it looks like you may be on CentOS.

So, I’d take a peek in /var/log/messages.

First, restart BIND – then afterwards, look in /var/log/messages for any errors that show up.

It does look like BIND isn’t answering queries on your server, so it may very well explain why that is in those logs when you restart it.

-Eric

loyalwhite · December 10, 2010, 5:32pm

Hi Eric,

Here is the content of /var/log/messages which is added when I restart BIND:

I am guessing the the “not listening on any interfaces” might be the crux of the problem?

Dec 10 17:29:30 server55711 named[14570]: shutting down: flushing changes
Dec 10 17:29:30 server55711 named[14570]: stopping command channel on 127.0.0.1#953
Dec 10 17:29:30 server55711 named[14570]: stopping command channel on ::1#953
Dec 10 17:29:30 server55711 named[14570]: exiting
Dec 10 17:29:31 server55711 named[15611]: starting BIND 9.3.6-P1-RedHat-9.3.6-4.P1.el5_4.2 -u named
Dec 10 17:29:31 server55711 named[15611]: adjusted limit on open files from 1024 to 1048576
Dec 10 17:29:31 server55711 named[15611]: found 4 CPUs, using 4 worker threads
Dec 10 17:29:31 server55711 named[15611]: using up to 4096 sockets
Dec 10 17:29:31 server55711 named[15611]: loading configuration from '/etc/named.conf'
Dec 10 17:29:31 server55711 named[15611]: using default UDP/IPv4 port range: [1024, 65535]
Dec 10 17:29:31 server55711 named[15611]: using default UDP/IPv6 port range: [1024, 65535]
Dec 10 17:29:31 server55711 named[15611]: /etc/named.conf:9: undefined ACL '83.170.79.9,83.170.78.155,83.170.78.156'
Dec 10 17:29:31 server55711 named[15611]: not listening on any interfaces
Dec 10 17:29:31 server55711 named[15611]: command channel listening on 127.0.0.1#953
Dec 10 17:29:31 server55711 named[15611]: command channel listening on ::1#953
Dec 10 17:29:31 server55711 named[15611]: the working directory is not writable
Dec 10 17:29:31 server55711 named[15611]: zone waos-online.co.uk/IN: loaded serial 1289610790
Dec 10 17:29:31 server55711 named[15611]: zone registration.waos-online.co.uk/IN: loaded serial 1290852794
Dec 10 17:29:31 server55711 named[15611]: running
Dec 10 17:29:31 server55711 named[15611]: zone registration.waos-online.co.uk/IN: sending notifies (serial 1290852794)
Dec 10 17:29:31 server55711 named[15611]: zone waos-online.co.uk/IN: sending notifies (serial 1289610790)

Eric · December 10, 2010, 5:35pm

Hmm, it looks like there may be a syntax error in your /etc/named.conf file.

Is there any chance you could post the contents of it?

In particular, line 9 appears to be the problem, but having the full context would help.

-Eric

loyalwhite · December 10, 2010, 5:51pm

Hi Eric

Sure, it’s below, but the plot thickens.

I went into the BIND settings in Webmin and into Addresses and Topology. Under ports and address to listen on, I cleared my three IP addresses out, saved, restarted, added them again, saved restarted… straight away BIND came up. Checked on intoDNS - all green, no problems.

HOWEVER… as soon as I surf to my site, I get an internal server error on a particular page… and that crashes BIND. Which is obviously what caused the problem in the first place. Where would I find the log to tell me more about internal server errors? (I am using CentOS as you summised)

The named.conf file is as follows:


options {
    directory "/etc";
    pid-file "/var/run/named/named.pid";
	allow-recursion {
		localnets;
		127.0.0.1;
		};
		
	
	listen-on {
		83.170.79.9,83.170.78.155,83.170.78.156;
		};
    };

zone "." {
    type hint;
    file "/etc/db.cache";
    };

zone "waos-online.co.uk" {
	type master;
	file "/var/named/waos-online.co.uk.hosts";
	allow-transfer {
		127.0.0.1;
		localnets;
		};
	};
zone "registration.waos-online.co.uk" {
	type master;
	file "/var/named/registration.waos-online.co.uk.hosts";
	allow-transfer {
		127.0.0.1;
		localnets;
		};
	};

loyalwhite · December 10, 2010, 5:52pm

The text of the internal server error echo’d to the browser is:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, root@localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Eric · December 10, 2010, 6:10pm

Hrm… in theory, a website shouldn’t be causing BIND to crash. Unless it’s somehow changing the BIND config and restarting it

If you restart BIND again, do you see those same errors in /var/log/messages?

To see what error your website is producing, you can look in $HOME/logs/error_log.

-Eric

loyalwhite · December 10, 2010, 9:53pm

OK, it seems that the DNS problem is now solved thanks to rectifying the problem in line 9.

However, the underlying server error turns out to be:

“(110)Connection timed out: mod_fcgid: ap_pass_brigade failed in handle_request function”

Having searched around extensively for information on this, it seems that I need to increase the FcgidMaxProcessesPerClass setting to something well above what is a fairly low VirtualMin default.

But I cannot for the life of me figure out how! Eric, in a previous forum posting, you say “Another option as well would be to go into Administration Options -> Edit Resource Limits, and to set “Max Number of Processes” for any Virtual Servers you’d like to have limits for.”

I do not see “Edit resource limits” under Admin Options - is that because I am running GPL not Pro? If so, how do I change this setting?

Eric · December 11, 2010, 12:00am

Howdy,

It likely is a Pro vs GPL thing. It was only recently that Virtualmin GPL began enabling you to configure FCGID, and I suspect the resource limits didn’t make it in there yet.

You would need to manually edit the Apache config, and add set those values in the VirtualHost block for your domain.

Alternatively, you could always move away from FCGID to CGI, which may alleviate some of the problems you’re seeing. To do that, you could go into Server Configuration -> Website Options, and set the PHP Execution Mode.

-Eric

loyalwhite · December 11, 2010, 12:28am

Eric,

I tried switching to CGI, and no error is generated but the PHP code still takes a ludicrous amount of time to process - up to a minute just to parse some RSS feeds. Ironically, until last week, I had this site on an older, slower server which was not running Virtualmin, and it parsed out this code in less than a second.

I will edit the Apache config, but can you tell me where? I tried going through Services/Configure Website/Edit Directives and adding FcgidMaxProcessesPerClass 100, but when I tried to restart Apache it returned the error

"Syntax error on line 1053 of /etc/httpd/conf/httpd.conf:
Invalid command ‘FcgidMaxProcessesPerClass’, perhaps misspelled or defined by a module not included in the server configuration’

I just need to know exactly where to put that line, assuming that is the correct line.

helpmin · December 11, 2010, 1:24am

Did you check the settings in /home/yourdomain/fcgi-bin/php5.fcgi?

You could probably also check whether you have a swtune.conf file (if you are on a VPS)?

loyalwhite · December 11, 2010, 1:33am

Hey Snapmin. I did, but the problem I had there was that I could not mod that file, even when logged in as root. I’m sure I could find a way around that. I was using Transmit, which may have been part of the problem there. Is that where I should put the FcgidMaxProcessesPerClass directive?

It’s not a VPS, it’s a dedicated server.