Failed: end of filezone - loading from master - failed: end of file

tmiland · March 8, 2021, 12:51pm

Hello,

i get this error on new server creation:

zone subdomain.domain.com/IN: loading from master file /var/lib/bind/subdomain.domain.com.hosts failed: end of file

The issue is that a new line is added to the zone file…
If i delete the new line at the end of the file and restart bind, everything is okay.

What can the cause of this be?

Joe · March 9, 2021, 2:07am

Did you modify the Server Template for zone records?

tmiland · March 9, 2021, 9:00am

Only changed some of the settings.

BIND DNS records for new domains is set to No additional records.

Joe · March 9, 2021, 8:08pm

Is it just one domain? i.e. are other zones working fine? Was it imported or migrated? I’m pretty much out of ideas, as I’ve never seen this happen.

tmiland · March 9, 2021, 8:28pm

This was a new virtual server. No errors on other zones.

Tried to create another one with options:

[Setup DNS zone?]

No other options.

Additional feature options
- Slave zone master DNS servers [IP to master]

No errors on webmin → servers → BIND DNS Server → Check BIND Config.

The server which got the error was setup with DNS, Nginx, Postgres and MySQL.

So, seems I’m not able to reproduce the problem. I’m a bit puzzled…

Joe · March 9, 2021, 8:33pm

Oh…I bet I know what’s happening. How much memory do you have?

Check the kernel log for out of memory errors. I bet you have one (or more). I’d almost be willing to bet this is the OOM killer biting you.

tmiland · March 9, 2021, 8:47pm

I have seen a few of those, so yes that could be what’s happening.

There’s 4GiB Real memory and 8GiB virtual.

Is it okay to use “OOMScoreAdjust” in the systemd service for webmin?

Or are there any other options besides increasing the memory/cost?

Joe · March 9, 2021, 8:49pm

If you’ve seen OOM killer events, that’s definitely what’s happening, and it’s disastrous. You can’t run a production server that randomly kills processes.

This should be plenty. You must have something leaking memory, or it’s a very heavily loaded server with tons of services.

tmiland · March 9, 2021, 8:54pm

Well, the OOM Killer overly aggressive, that’s all i know.

Memory is around 40%, virtual 17% at the moment, so i cannot say it’s heavily loaded.

Joe · March 9, 2021, 9:07pm

Is this an OpenVZ or Virtuozzo instance?

If it is not (e.g. it’s physical hardware or VM running under KVM or Xen), it isn’t possible for it to be overly aggressive. It only kills processes when it’s literally out of memory. At that point it has no choice. Something literally must be killed.

OOMScoreAdjust should never come into play. You need to fix it so processes aren’t being randomly killed. But, if you can’t do that, I guess you can make your least important processes first on the chopping block.

But, figure out what’s taking up your memory and fix it. Running out of memory is a disaster to be avoided at any cost. You must reduce your usage or increase available memory (or choose a host that doesn’t use OpenVZ or Virtuozzo and oversell memory, which causes random memory errors no matter how much memory it claims you have).

tmiland · March 9, 2021, 9:13pm

It’s a DigitalOcean droplet.

It has been a common problem for a long time now, that i have to manually restart webmin, usermin, csf and lfd on several (at least 4) droplets.

They should have enough memory and swap, so there’s something going on here i have to figure out i guess.

Joe · March 9, 2021, 9:21pm

OK, they use KVM or Xen, so it’s not oversold. So, yes, you need to reduce memory. You cannot run production services on a system that’s having OOM killer events. Nothing me or Webmin can do about that.

The only right number of OOM killer events is zero.

tmiland · March 9, 2021, 9:25pm

Here’s what’s been killed from syslog just today:

grep -i kill /var/log/syslog

Mar  9 00:00:02 vps1 systemd[1]: lfd.service: Main process exited, code=killed, status=9/KILL
Mar  9 01:36:30 vps1 systemd[1]: packagekit.service: Main process exited, code=killed, status=15/TERM
Mar  9 03:46:08 vps1 systemd[1]: packagekit.service: Main process exited, code=killed, status=15/TERM
Mar  9 05:00:45 vps1 systemd[1]: lfd.service: Main process exited, code=killed, status=9/KILL
Mar  9 06:51:50 vps1 spamd[15781]: spamd: child [16087] killed successfully: interrupted, signal 2 (0002)
Mar  9 06:51:50 vps1 spamd[15781]: spamd: child [16088] killed successfully: interrupted, signal 2 (0002)
Mar  9 07:00:10 vps1 systemd[1]: lfd.service: Main process exited, code=killed, status=9/KILL
Mar  9 07:37:42 vps1 systemd[1]: packagekit.service: Main process exited, code=killed, status=15/TERM
Mar  9 11:32:22 vps1 systemd[1]: lfd.service: Main process exited, code=killed, status=9/KILL
Mar  9 13:40:50 vps1 systemd[1]: packagekit.service: Main process exited, code=killed, status=15/TERM
Mar  9 14:52:16 vps1 systemd[1]: lfd.service: Main process exited, code=killed, status=9/KILL
Mar  9 15:27:19 vps1 systemd[1]: lfd.service: Main process exited, code=killed, status=9/KILL
Mar  9 19:35:54 vps1 usermin[32541]: /etc/usermin/stop: 4: kill: No such process
Mar  9 19:41:20 vps1 systemd[1]: packagekit.service: Main process exited, code=killed, status=15/TERM

Yesterday is even worse…

*Edit
It should state invoked oom-killer though, which it isn’t.

Joe · March 9, 2021, 10:00pm

That’s not the OOM killer. OOM killer is a kernel event, and will appear in the kernel log.

tmiland · March 10, 2021, 2:21pm

There’s nothing in the kernel logs about the OOM killer.

Joe · March 10, 2021, 3:29pm

OK, then it’s some other problem. I don’t have good guesses about what. It seems like it has to have been Webmin being killed while it was writing the file (or it ran out of space), and if it’s not the OOM killer, I don’t know what else it’d be.

system · March 18, 2021, 3:29pm

This topic was automatically closed 8 days after the last reply. New replies are no longer allowed.