BIND exiting and shutting down for no apparent reason - random RCODE REFUSED

calderwood · September 25, 2013, 3:12pm

I have been seeing random BIND shut down and restart on Virtualmin GPL install. The shut down is always preceded by some error but random domain or IP entry. these IP addresses and domains are not on my server. The code message is often “error (unexpected RCODE REFUSED) resolving to some domain/IP”

If anyone can point me in the correct direction for finding the cause. I am assuming there is a bad setting somewhere. I am running CentOS6 with Virtualmin GPL

A sample log of a shutdown


Sep 25 06:01:06 ip-68-178-130-21 named[20231]: error (unexpected RCODE REFUSED) resolving '49.142.218.222.in-addr.arpa/PTR/IN': 202.103.225.70#53
Sep 25 06:01:07 ip-68-178-130-21 named[20231]: error (unexpected RCODE REFUSED) resolving '49.142.218.222.in-addr.arpa/PTR/IN': 202.103.224.69#53
Sep 25 06:01:09 ip-68-178-130-21 named[20231]: error (unexpected RCODE REFUSED) resolving '49.142.218.222.in-addr.arpa/PTR/IN': 202.103.224.69#53
Sep 25 06:01:09 ip-68-178-130-21 named[20231]: error (unexpected RCODE REFUSED) resolving '49.142.218.222.in-addr.arpa/PTR/IN': 202.103.225.70#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: shutting down
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: stopping command channel on 127.0.0.1#953
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: stopping command channel on ::1#953
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on ::#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on 127.0.0.1#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on 68.178.130.21#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on 68.178.131.108#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on 50.62.34.152#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: exiting
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: starting BIND 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6.3 -u named
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--with-pic' '--disable-static' '--disable-openssl-version-check' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-fixed-rrset' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'target_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' 'CPPFLAGS= -DDIG_SIGCHASE'
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: ----------------------------------------------------
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: BIND 9 is maintained by Internet Systems Consortium,
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: Inc. (ISC), a non-profit 501(c)(3) public-benefit 
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: corporation.  Support and training for BIND 9 are 
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: available at https://www.isc.org/support
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: ----------------------------------------------------
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: adjusted limit on open files from 1024 to 1048576
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: found 16 CPUs, using 16 worker threads

Another BIND Shutdown was this one - these IP addresses and domains are not on my server:


Sep 25 05:46:25 ip-68-178-130-21 named[22235]: client 122.136.196.116#63897: query (cache) 'aa3247.com/A/IN' denied
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: client 122.136.196.116#3209: query (cache) 'aa3247.com/A/IN' denied
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: shutting down
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: stopping command channel on 127.0.0.1#953
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: stopping command channel on ::1#953
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on ::#53
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on 127.0.0.1#53
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on 68.178.130.21#53
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on 68.178.131.108#53
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on 50.62.34.152#53
Sep 25 05:46:27 ip-68-178-130-21 named[22235]: exiting

Eric · September 25, 2013, 5:07pm

Howdy,

Are you by chance using an OpenVZ-based VPS?

If so, can you paste in the contents of your /proc/user_beancounters file?

-Eric

calderwood · September 25, 2013, 5:17pm

This time BIND just went down twice in a row but with yet another error code:


Sep 25 09:32:38 ip-68-178-130-21 named[29722]: error (unexpected RCODE SERVFAIL) resolving '228.195.58.213.in-addr.arpa/PTR/IN': 2001:67c:e0::6#53
Sep 25 09:35:16 ip-68-178-130-21 php-cgi: PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
Sep 25 09:35:19 ip-68-178-130-21 named[29722]: client 122.136.196.116#29758: query (cache) 'aa3247.com/A/IN' denied
Sep 25 09:35:33 ip-68-178-130-21 named[29722]: error (unexpected RCODE REFUSED) resolving 'ns.bta.net.cn/A/IN': 202.106.196.233#53
Sep 25 09:35:33 ip-68-178-130-21 named[29722]: error (unexpected RCODE REFUSED) resolving 'ns.bta.net.cn/AAAA/IN': 202.106.196.233#53
Sep 25 09:35:33 ip-68-178-130-21 named[29722]: error (unexpected RCODE REFUSED) resolving 'ns2.bta.net.cn/A/IN': 202.106.196.233#53
Sep 25 09:35:33 ip-68-178-130-21 named[29722]: error (unexpected RCODE REFUSED) resolving 'ns2.bta.net.cn/AAAA/IN': 202.106.196.233#53
Sep 25 09:35:56 ip-68-178-130-21 named[29722]: error (connection refused) resolving '130.36.99.91.psbl.surriel.com/A/IN': 82.94.250.75#53
Sep 25 09:37:41 ip-68-178-130-21 named[29722]: error (unexpected RCODE SERVFAIL) resolving '228.195.58.213.in-addr.arpa/PTR/IN': 193.0.9.6#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: shutting down
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: stopping command channel on 127.0.0.1#953
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: stopping command channel on ::1#953
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on ::#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on 127.0.0.1#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on 68.178.130.21#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on 68.178.131.108#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on 50.62.34.152#53
Sep 25 09:38:22 ip-68-178-130-21 named[29722]: exiting

calderwood · September 25, 2013, 5:19pm

Yes it is.


lockedpages                     0                    0  9223372036854775807  9223372036854775807                    0
            privvmpages                799349               840633              1572864              1572864                    0
            shmpages                     1206                 1206  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            numproc                       167                  167                32567                32567                    0
            physpages                  638774               678198  9223372036854775807  9223372036854775807                    0
            vmguarpages                     0                    0               786432  9223372036854775807                    0
            oomguarpages               638775               678199  9223372036854775807  9223372036854775807                    0
            numtcpsock                     48                   51  9223372036854775807  9223372036854775807                    0
            numflock                      171                  172  9223372036854775807  9223372036854775807                    0
            numpty                          1                    1                  255                  255                    0
            numsiginfo                      6                    7                 1024                 1024                    0
            tcpsndbuf                 1465696              1520640  9223372036854775807  9223372036854775807                    0
            tcprcvbuf                  786432               835584  9223372036854775807  9223372036854775807                    0
            othersockbuf               282656               565536  9223372036854775807  9223372036854775807                    0
            dgramrcvbuf                     0                 9344  9223372036854775807  9223372036854775807                    0
            numothersock                  201                  207  9223372036854775807  9223372036854775807                    0
            dcachesize                2355922              2400994  9223372036854775807  9223372036854775807                    0
            numfile                     18951                18960  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            numiptent                      14                   14  9223372036854775807  9223372036854775807                    0

Locutus · September 25, 2013, 6:49pm

I don’t think the error message and the exiting is directly related. In your first example, there’s over a minute between these events. Can you check how much time passed between each of these incidences?

I’d also think that BIND is hitting some kind of resource limit. You might check the syslog for further errors, maybe OOM messages. Short of a programming bug, it should not simply exit like that. You might check with “tcpdump udp port 53” what the load of your BIND is.

calderwood · September 25, 2013, 8:17pm

Looks like about 20 items per second - not sure what is normal.

This happens about 4 times a day, but at no particular time. I’ve looked thru the logs, but the only consistency I see is right before BIND closes, there is always an error message before such as:


error (unexpected RCODE REFUSED) 
error (unexpected RCODE SERVFAIL)
error (unexpected RCODE REFUSED)

Sometimes there can be 30-50 error messages within a few minutes all on the same PID with different IPs

calderwood · September 25, 2013, 8:17pm

Looks like about 20 items per second - not sure what is normal.

This happens about 4 times a day, but at no particular time. I’ve looked thru the logs, but the only consistency I see is right before BIND closes, there is always an error message before such as:


error (unexpected RCODE REFUSED) 
error (unexpected RCODE SERVFAIL)
error (unexpected RCODE REFUSED)

Sometimes there can be 30-50 error messages within a few minutes all on the same PID with different IPs

calderwood · September 25, 2013, 8:18pm

Looks like about 20 items per second - not sure what is normal.

This happens about 4 times a day, but at no particular time. I’ve looked thru the logs, but the only consistency I see is right before BIND closes, there is always an error message before such as:


error (unexpected RCODE REFUSED) 
error (unexpected RCODE SERVFAIL)
error (unexpected RCODE REFUSED)

Sometimes there can be 30-50 error messages within a few minutes all on the same PID with different IPs

Locutus · September 25, 2013, 9:50pm

The resolver errors you see are reverse lookups, probably triggered by some service on your machine that wants to know the hostname for an IP address that is trying to connect (e.g. SSHD and Postfix can do that). You might want to check your syslog, auth-log or mail-log for an unusually high amount of connection attempts / login failures, which indicate dictionary attacks.

Still, BIND should not simply exit from something like this. It’s meant to process a considerable amount of requests per second.

Did you check the syslog or kernel log for additional errors like OOMs? Otherwise, I’m fairly certain that it’s a problem with the virtualization software in use for your server. What’s your used and free memory? You might want to use the tool “atop” to record and playback performance data, that should give you an idea if something has a memory spike or similar.

calderwood · September 25, 2013, 11:35pm

No OOMs and nothing in the kernel logs.

However I did find spamassasin set to lookup DNS test mode (not sure that would matter) and did fine an email account that is trying to login every 3 seconds with an authentication failure in saslauthd

The only thing I have seen pulling a lot of CPU is clamav.

It’s not a busy server - only 100 virtual servers.

Eric · September 26, 2013, 3:11am

It’s not uncommon that we see unusual issues and service that die unexpectedly on OpenVZ-based systems… but typically when that occurs, the “failcnt” field of the user_beancounters file shows some non-0 values in it.

On your system, they all appear to be “0”.

How much RAM is in your server, and what is the current usage?

You can determine that by running this command:

free -m

Locutus · September 26, 2013, 8:43am

I can only repeat what I wrote last time, and what Eric said too now: We need to know what your RAM situation is. free and atop are recommended to provide that information, and to check historical memory data when BIND crashes.

calderwood · September 26, 2013, 1:35pm

The system has 3GB ram with 6GB burstable


]# free -m
             total       used       free     shared    buffers     cached
Mem:          3072       1708       1363          0          0          0
-/+ buffers/cache:       1708       1363
Swap:            0          0          0

Locutus · September 26, 2013, 2:21pm

This is really odd. Memory should be plenty, and no beancounter violation.

Two things I can think of right now: Intermittent memory problems. To find out about those, install “atop”. Or the BIND version you’re using (being an RC) might be buggy. Ubuntu officially uses 9.8.1, yours is 9.8.2-RC1.

You could try turning on BIND’s own logging, including debug logging. Or you might want to check if there’s an earlier, non-RC version of BIND available. Are you using third-party repositories for your CentOS? Those often contain versions of software that make trouble when used with Virtualmin.

calderwood · September 26, 2013, 2:47pm

The Centos iss a clean standard install and Virtualmin is the standard. It is a real problem as the server closes down about 6 times a day. Stopped twice again this morning. I searched the logs, and the same RCODE error then a BIND exit. I noticed that there is a bunch of messages all relating to the same PID. Don’t know if it means anything.

Close at 7.03AM
Sep 26 07:03:38 ip-68-178-130-21 named[4010]: error (unexpected RCODE REFUSED) resolving ‘16.136.119.140.in-addr.arpa/PTR/IN’: 2001:288:2001::8c77:36e#53 Sep 26 07:03:40 ip-68-178-130-21 php-cgi: PHP Warning: Module ‘mcrypt’ already loaded in Unknown on line 0 Sep 26 07:03:40 ip-68-178-130-21 php-cgi: PHP Warning: Module ‘mcrypt’ already loaded in Unknown on line 0 Sep 26 07:04:01 ip-68-178-130-21 php-cgi: PHP Warning: Module ‘mcrypt’ already loaded in Unknown on line 0 Sep 26 07:04:06 ip-68-178-130-21 php-cgi: PHP Warning: Module ‘mcrypt’ already loaded in Unknown on line 0 Sep 26 07:04:39 ip-68-178-130-21 named[4010]: error (unexpected RCODE SERVFAIL) resolving ‘65.37.114.42.in-addr.arpa/PTR/IN’: 210.245.31.10#53 Sep 26 07:04:49 ip-68-178-130-21 named[4010]: client 122.136.196.116#45453: query (cache) ‘aa3247.com/A/IN’ denied Sep 26 07:05:46 ip-68-178-130-21 php-cgi: PHP Warning: Module ‘mcrypt’ already loaded in Unknown on line 0 Sep 26 07:06:41 ip-68-178-130-21 named[4010]: client 122.136.196.116#61009: query (cache) ‘aa3247.com/A/IN’ denied Sep 26 07:07:17 ip-68-178-130-21 named[4010]: client 122.136.196.116#45732: query (cache) ‘aa3247.com/A/IN’ denied Sep 26 07:07:41 ip-68-178-130-21 named[4010]: shutting down Sep 26 07:07:41 ip-68-178-130-21 named[4010]: stopping command channel on 127.0.0.1#953 Sep 26 07:07:41 ip-68-178-130-21 named[4010]: stopping command channel on ::1#953 Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on ::#53 Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on 127.0.0.1#53 Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on 68.178.130.21#53 Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on 68.178.131.108#53 Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on 50.62.34.152#53 Sep 26 07:07:41 ip-68-178-130-21 named[4010]: exiting Sep 26 07:12:50 ip-68-178-130-21 named[7851]: starting BIND 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6.3 -u named Sep 26 07:12:50 ip-68-178-130-21 named[7851]: built with ‘–build=x86_64-redhat-linux-gnu’ ‘–host=x86_64-redhat-linux-gnu’ ‘–target=x86_64-redhat-linux-gnu’ ‘–program-prefix=’ ‘–prefix=/usr’ ‘–exec-prefix=/usr’ ‘–bindir=/usr/bin’ ‘–sbindir=/usr/sbin’ ‘–sysconfdir=/etc’ ‘–datadir=/usr/share’ ‘–includedir=/usr/include’ ‘–libdir=/usr/lib64’ ‘–libexecdir=/usr/libexec’ ‘–sharedstatedir=/var/lib’ ‘–mandir=/usr/share/man’ ‘–infodir=/usr/share/info’ ‘–with-libtool’ ‘–localstatedir=/var’ ‘–enable-threads’ ‘–enable-ipv6’ ‘–with-pic’ ‘–disable-static’ ‘–disable-openssl-version-check’ ‘–with-dlz-ldap=yes’ ‘–with-dlz-postgres=yes’ ‘–with-dlz-mysql=yes’ ‘–with-dlz-filesystem=yes’ ‘–with-gssapi=yes’ ‘–disable-isc-spnego’ ‘–with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets’ ‘–enable-fixed-rrset’ ‘build_alias=x86_64-redhat-linux-gnu’ ‘host_alias=x86_64-redhat-linux-gnu’ ‘target_alias=x86_64-redhat-linux-gnu’ ‘CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic’ ‘CPPFLAGS= -DDIG_SIGCHASE’ Sep 26 07:12:50 ip-68-178-130-21 named[7851]: ---------------------------------------------------- Sep 26 07:12:50 ip-68-178-130-21 named[7851]: BIND 9 is maintained by Internet Systems Consortium, Sep 26 07:12:50 ip-68-178-130-21 named[7851]: Inc. (ISC), a non-profit 501©(3) public-benefit Sep 26 07:12:50 ip-68-178-130-21 named[7851]: corporation. Support and training for BIND 9 are Sep 26 07:12:50 ip-68-178-130-21 named[7851]: available at https://www.isc.org/support Sep 26 07:12:50 ip-68-178-130-21 named[7851]: ---------------------------------------------------- Sep 26 07:12:50 ip-68-178-130-21 named[7851]: adjusted limit on open files from 1024 to 1048576 Sep 26 07:12:50 ip-68-178-130-21 named[7851]: found 16 CPUs, using 16 worker threads Sep 26 07:12:50 ip-68-178-130-21 named[7851]: using up to 4096 sockets Sep 26 07:12:50 ip-68-178-130-21 named[7851]: loading configuration from ‘/etc/named.conf’ Sep 26 07:12:50 ip-68-178-130-21 named[7851]: reading built-in trusted keys from file ‘/etc/named.iscdlv.key’ Sep 26 07:12:50 ip-68-178-130-21 named[7851]: using default UDP/IPv4 port range: [1024, 65535] Sep 26 07:12:50 ip-68-178-130-21 named[7851]: using default UDP/IPv6 port range: [1024, 65535] Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv6 interfaces, port 53 Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv4 interface lo, 127.0.0.1#53 Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv4 interface eth0, 68.178.130.21#53 Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv4 interface eth0:1, 68.178.131.108#53 Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv4 interface eth0:2, 50.62.34.152#53 Sep 26 07:12:50 ip-68-178-130-21 named[7851]: generating session key for dynamic DNS Sep 26 07:12:50 ip-68-178-130-21 named[7851]: sizing zone task pool based on 95 zones Sep 26 07:12:50 ip-68-178-130-21 named[7851]: set up managed keys zone for view _default, file ‘/var/named/dynamic/managed-keys.bind’ Sep 26 07:12:50 ip-68-178-130-21 named[7851]: Warning: ‘empty-zones-enable/disable-empty-zone’ not set: disabling RFC 1918 empty zones Sep 26 07:12:50 ip-68-178-130-21 named[7851]: automatic empty zone: 127.IN-ADDR.ARPA

Close at 9:22AM
Sep 26 09:21:14 ip-68-178-130-21 named[26623]: error (unexpected RCODE SERVFAIL) resolving ‘226.128.136.183.in-addr.arpa/PTR/IN’: 202.96.103.36#53 Sep 26 09:21:28 ip-68-178-130-21 php-cgi: PHP Warning: Module ‘mcrypt’ already loaded in Unknown on line 0 Sep 26 09:21:52 ip-68-178-130-21 php-cgi: PHP Warning: Module ‘mcrypt’ already loaded in Unknown on line 0 Sep 26 09:22:30 ip-68-178-130-21 named[26623]: received control channel command ‘stop’ Sep 26 09:22:30 ip-68-178-130-21 named[26623]: shutting down: flushing changes Sep 26 09:22:30 ip-68-178-130-21 named[26623]: stopping command channel on 127.0.0.1#953 Sep 26 09:22:30 ip-68-178-130-21 named[26623]: stopping command channel on ::1#953 Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on ::#53 Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on 127.0.0.1#53 Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on 68.178.130.21#53 Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on 68.178.131.108#53 Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on 50.62.34.152#53 Sep 26 09:22:30 ip-68-178-130-21 named[26623]: exiting Sep 26 09:22:32 ip-68-178-130-21 named[3617]: starting BIND 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6.3 -u named Sep 26 09:22:32 ip-68-178-130-21 named[3617]: built with ‘–build=x86_64-redhat-linux-gnu’ ‘–host=x86_64-redhat-linux-gnu’ ‘–target=x86_64-redhat-linux-gnu’ ‘–program-prefix=’ ‘–prefix=/usr’ ‘–exec-prefix=/usr’ ‘–bindir=/usr/bin’ ‘–sbindir=/usr/sbin’ ‘–sysconfdir=/etc’ ‘–datadir=/usr/share’ ‘–includedir=/usr/include’ ‘–libdir=/usr/lib64’ ‘–libexecdir=/usr/libexec’ ‘–sharedstatedir=/var/lib’ ‘–mandir=/usr/share/man’ ‘–infodir=/usr/share/info’ ‘–with-libtool’ ‘–localstatedir=/var’ ‘–enable-threads’ ‘–enable-ipv6’ ‘–with-pic’ ‘–disable-static’ ‘–disable-openssl-version-check’ ‘–with-dlz-ldap=yes’ ‘–with-dlz-postgres=yes’ ‘–with-dlz-mysql=yes’ ‘–with-dlz-filesystem=yes’ ‘–with-gssapi=yes’ ‘–disable-isc-spnego’ ‘–with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets’ ‘–enable-fixed-rrset’ ‘build_alias=x86_64-redhat-linux-gnu’ ‘host_alias=x86_64-redhat-linux-gnu’ ‘target_alias=x86_64-redhat-linux-gnu’ ‘CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic’ ‘CPPFLAGS= -DDIG_SIGCHASE’ Sep 26 09:22:32 ip-68-178-130-21 named[3617]: ---------------------------------------------------- Sep 26 09:22:32 ip-68-178-130-21 named[3617]: BIND 9 is maintained by Internet Systems Consortium, Sep 26 09:22:32 ip-68-178-130-21 named[3617]: Inc. (ISC), a non-profit 501©(3) public-benefit Sep 26 09:22:32 ip-68-178-130-21 named[3617]: corporation. Support and training for BIND 9 are Sep 26 09:22:32 ip-68-178-130-21 named[3617]: available at https://www.isc.org/support Sep 26 09:22:32 ip-68-178-130-21 named[3617]: ---------------------------------------------------- Sep 26 09:22:32 ip-68-178-130-21 named[3617]: adjusted limit on open files from 1024 to 1048576 Sep 26 09:22:32 ip-68-178-130-21 named[3617]: found 16 CPUs, using 16 worker threads Sep 26 09:22:32 ip-68-178-130-21 named[3617]: using up to 4096 sockets Sep 26 09:22:32 ip-68-178-130-21 named[3617]: loading configuration from ‘/etc/named.conf’ Sep 26 09:22:32 ip-68-178-130-21 named[3617]: reading built-in trusted keys from file ‘/etc/named.iscdlv.key’ Sep 26 09:22:32 ip-68-178-130-21 named[3617]: using default UDP/IPv4 port range: [1024, 65535] Sep 26 09:22:32 ip-68-178-130-21 named[3617]: using default UDP/IPv6 port range: [1024, 65535]

calderwood · September 26, 2013, 2:52pm

One setting question for BIND.

Do full recursive lookups for clients? It is set to Yes

Should it be NO?

Locutus · September 26, 2013, 3:30pm

So what about my suggestions, like atop, BIND debug log and downgrading to a stable non-RC version? I’m afraid I can’t help you if you don’t reply to my suggestions.

calderwood · September 26, 2013, 4:06pm

The BIND debug is already on, but it does not seem to be writing to the file. I will be willing to downgrade but the current stable version is 9.8.6 and I have installed 9.8.2-0.17.rc1.el6.3 which is not the latest. I do not have atop installed, but checked it out this morning and will install.

Thanks for the advice

calderwood · September 26, 2013, 4:28pm

I noticed that my ram is at capacity.


#free -m
             total       used       free     shared    buffers     cached
Mem:          3072       3072          0          0          0          0
-/+ buffers/cache:       3072          0
Swap:            0          0          0

Locutus · September 26, 2013, 5:46pm

You’d definitely want to limit recursive queries to your local server, and not allow it for any user on the internet. That’s done with the “allow-recursion” directive, or there’s also an edit box in Webmin’s BIND module for it. So you should set “Do fully recursive lookups” to “Default”, and in “Addresses and Topology”, set “Allow recursive queries from” to “Listed…” and enter in the box

localnets localhost

As for the RAM filling up, did that happen when your BIND crashed? You really need ATOP, to have historical data about processes. It could be a script going haywire and spawning many processes, or BIND itself using too much RAM. But you won’t find that out without ATOP.

When you have it installed, go to /etc/init.d/atop and modify the line “DARGS=”, the number at the end is the interval in seconds that atop uses to store resource data. You might want to lower the default of 10 minutes to e.g. one minute, for your debugging purposes.