Vmin PRO: problem with lookup-domain.pl invoked oom-kille

dragona · October 4, 2011, 12:26pm

Hi,

I have a server with CentOS 5.6, latest Virtualmin Pro and latest upgrade from standard repository. This server is used for web and mail server.

Some times this server crash for too many “lookup-domain.pl” process, and also with reboot I must enter in SSH before lookup-domain is started otherwise the server re-crash in less then one minute:

Oct 3 15:51:11 vm2 kernel: lookup-domain.p invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Oct 3 15:51:11 vm2 kernel:
Oct 3 15:51:11 vm2 kernel: Call Trace:
Oct 3 15:51:11 vm2 kernel: [] out_of_memory+0x8e/0x2f3
Oct 3 15:51:11 vm2 kernel: [] __alloc_pages+0x27f/0x308
Oct 3 15:51:11 vm2 kernel: [] __do_page_cache_readahead+0x96/0x179
Oct 3 15:51:11 vm2 kernel: [] filemap_nopage+0x14c/0x360
Oct 3 15:51:11 vm2 kernel: [] __handle_mm_fault+0x1fb/0x1039
Oct 3 15:51:11 vm2 kernel: [] do_page_fault+0x4cb/0x874
Oct 3 15:51:11 vm2 kernel: [] do_filp_open+0x2a/0x38
Oct 3 15:51:11 vm2 kernel: [] error_exit+0x0/0x84
Oct 3 15:51:11 vm2 kernel:
Oct 3 15:51:38 vm2 kernel: Mem-info:
Oct 3 15:52:14 vm2 kernel: Node 0 DMA per-cpu:
Oct 3 15:52:17 vm2 kernel: cpu 0 hot: high 0, batch 1 used:0
Oct 3 15:52:24 vm2 kernel: cpu 0 cold: high 0, batch 1 used:0
Oct 3 15:52:31 vm2 kernel: cpu 1 hot: high 0, batch 1 used:0
Oct 3 15:52:32 vm2 kernel: cpu 1 cold: high 0, batch 1 used:0
Oct 3 15:52:32 vm2 kernel: Node 0 DMA32 per-cpu:
Oct 3 15:52:32 vm2 kernel: cpu 0 hot: high 186, batch 31 used:24
Oct 3 15:52:32 vm2 kernel: cpu 0 cold: high 62, batch 15 used:14
Oct 3 15:52:32 vm2 kernel: cpu 1 hot: high 186, batch 31 used:143
Oct 3 15:52:32 vm2 kernel: cpu 1 cold: high 62, batch 15 used:10
Oct 3 15:52:32 vm2 kernel: Node 0 Normal per-cpu: empty
Oct 3 15:52:32 vm2 kernel: Node 0 HighMem per-cpu: empty
Oct 3 15:52:32 vm2 kernel: Free pages: 7904kB (0kB HighMem)
Oct 3 15:52:32 vm2 kernel: Active:166770 inactive:186345 dirty:0 writeback:0 unstable:0 free:1976 slab:6867 mapped-file:804 mapped-anon:354502 pagetables:11
254
Oct 3 15:52:32 vm2 kernel: Node 0 DMA free:2988kB min:28kB low:32kB high:40kB active:0kB inactive:0kB present:9756kB pages_scanned:0 all_unreclaimable? yes
Oct 3 15:52:32 vm2 kernel: lowmem_reserve[]: 0 1499 1499 1499
Oct 3 15:52:32 vm2 kernel: Node 0 DMA32 free:4916kB min:4936kB low:6168kB high:7404kB active:667080kB inactive:745380kB present:1535136kB pages_scanned:2263
669 all_unreclaimable? yes
Oct 3 15:52:32 vm2 kernel: lowmem_reserve[]: 0 0 0 0
Oct 3 15:52:32 vm2 kernel: Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Oct 3 15:52:32 vm2 kernel: lowmem_reserve[]: 0 0 0 0
Oct 3 15:52:32 vm2 kernel: Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Oct 3 15:52:32 vm2 kernel: lowmem_reserve[]: 0 0 0 0
Oct 3 15:52:32 vm2 kernel: Node 0 DMA: 54kB 58kB 516kB 332kB 564kB 1128kB 1256kB 0512kB 21024kB 02048kB 04096kB = 2988kB
Oct 3 15:52:32 vm2 kernel: Node 0 DMA32: 494kB 28kB 016kB 332kB 064kB 0128kB 0256kB 3512kB 11024kB 12048kB 04096kB = 4916kB
Oct 3 15:52:32 vm2 kernel: Node 0 Normal: empty
Oct 3 15:52:32 vm2 kernel: Node 0 HighMem: empty
Oct 3 15:52:32 vm2 kernel: 1226 pagecache pages
Oct 3 15:52:32 vm2 kernel: Swap cache: add 425420, delete 425103, find 15802/31229, race 0+4
Oct 3 15:52:32 vm2 kernel: Free swap = 0kB
Oct 3 15:52:32 vm2 kernel: Total swap = 1048568kB
Oct 3 15:52:32 vm2 kernel: Free swap: 0kB

Why I see so many lookup-domain processes in “ps aux”? How can solve this problem? I have about 3-4 server with Vmin Pro and only one have this problem.

Thanks

Eric · October 4, 2011, 1:11pm

Howdy,

Can you list everything you see if you run this command:

ps auxw|grep lookup-domain

Also, how much RAM is in your server? You can determine that by running “free -m”.

-Eric

dragona · October 7, 2011, 12:05pm

Hi,

this is the result of the command in a normal situation:

[root@vmx2 ~]# ps auxw | grep lookup-dom
root 13399 0.0 0.0 61184 772 pts/0 S+ 14:02 0:00 grep lookup-dom
root 16618 2.8 1.7 163000 27272 ? Ss Oct04 120:32 /usr/libexec/webmin/virtual-server/lookup-domain-daemon.pl

[root@vmx2 ~]# free -m
total used free shared buffers cached
Mem: 1505 1495 9 0 12 200
-/+ buffers/cache: 1282 222
Swap: 1023 571 452

But when problem appears there are many, many, lookup-domain processes and the server go down in less than 2 minutes.

Thanks

Eric · October 7, 2011, 1:10pm

I wanted to make sure that the lookup domain daemon was running… which it does appear to be. So that part is good.

How often does that other problem occur?

I know this isn’t really possible to test under normal conditions, but if you’re seeing a large burst of email in your mail queue when that occurs.

You can view your mail queue with the “mailq” command… or to just see a total of the messages in there, you can run this command:

mailq | tail -1

I know this doesn’t help you prevent the problem from occurring, but we’d need to see what’s going on when that problem is occurring, I don’t see anything out of the ordinary right now.

Actually, next time that happens, I would run these 3 commands:

1: mailq | tail -1

2: ps auxw

uptime

Those would help nail down what exactly is going on.

-Eric