High System CPU Load Average

Hi All,

I am totally puzzled at the moment as to what Virtualmin is doing, after recently updating everything to the latest versions, I am getting the following CPU load averages and constant alerts from CFS.

CPU load averages 9.45 (1 min) 9.32 (5 mins) 9.77 (15 mins)

Running top via ssh I get the following

Processes: 175 total, 2 running, 4 stuck, 169 sleeping, 944 threads 16:54:15
Load Avg: 1.16, 1.13, 1.13 CPU usage: 3.74% user, 2.72% sys, 93.53% idle
SharedLibs: 14M resident, 14M data, 0B linkedit.
MemRegions: 55177 total, 917M resident, 48M private, 345M shared.
PhysMem: 2845M used (1000M wired), 4237M unused.
VM: 447G vsize, 1073M framework vsize, 11607078(0) swapins, 14171139(0) swapouts
Networks: packets: 14989373/17G in, 10427533/1423M out.
Disks: 2651509/109G read, 2162583/222G written.

PID COMMAND %CPU TIME #TH #WQ #PORT #MREGS MEM RPRVT PURG
19094 mdworker 0.0 00:00.03 3 0 52 67 2196K 1340K 0B
19093 mdworker 0.0 00:00.03 3 0 52 69 3084K 2228K 0B
19092 syncdefaults 0.0 00:00.28 6 2 88 82 5132K 3952K 0B
19091 mdworker 0.0 00:00.06 3 0 52 69 5164K 4256K 0B
19089 top 9.3 00:14.13 1/1 0 26 41 2204K 1972K 0B
19086 bash 0.0 00:00.00 1 0 19 31 616K 448K 0B
19085 login 0.0 00:00.01 2 0 30 52 1168K 840K 0B
19078 TextEdit 0.0 00:00.27 5 2 170 184 13M 6556K 20K
19070 CVMCompiler 0.0 00:00.73 2 1 32 80 24M 24M 12K
19067 Terminal 24.0 00:03.02 13 7 179 212 20M+ 15M+ 80K
19057 com.apple.We 0.0 00:02.84 14 2 183 331 28M 25M 36K
19055 netbiosd 0.0 00:00.07 2 1 42 53 1888K 1484K 0B
19049 com.apple.iC 0.0 00:00.24 4 0 82 82 3892K 3112K 0B
19040 rpcsvchost 0.0 00:00.02 16 1 44 82 1428K 1092K 0B

Not sure where Virtualmin is pulling those averages from, and I’m not sure what is causing it. First I thought my server got hacked and sending out SPAM, but there is nothing in the mail queue.

Anyone got any ideas? Restarting my server gets it back down to the usual average of 0.3 for a day or two, then it starts to build back up.

I got an alert for 11.4 5 min load average around a hour ago. The websites aren’t getting any extra hits as usual, so it can’t be that…

Howdy,

Hmm, the output above appears that it’s from an Apple computer, not a Linux server that would be running Virtualmin. Is that process information from the correct system?

-Eric

Ooops you are correct, what I get for posting in haste - saying that, i cannot connect to the server by ssh, it asks me for a login, and then i enter my password then it just stays blank :S

At this moment in time, its now running 11.4

CPU load averages: 11.30 (1 mins) , 11.25 (5 mins) , 11.22 (15 mins)
CPU type: Intel® Core™ i3 CPU 540 @ 3.07GHz , 4 cores

21916 jamessimpson 3.0 % /usr/bin/php-cgi
22225 jamessimpson 3.0 % /usr/bin/php-cgi
21915 jamessimpson 2.0 % /usr/bin/php-cgi
23138 root 1.2 % /usr/libexec/webmin/proc/index_cpu.cgi
1772 mysql 0.5 % /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-e …
19 root 0.4 % [events/0]
14555 drivingroads 0.4 % /usr/bin/php-cgi
14797 drivingroads 0.4 % /usr/bin/php-cgi
6827 bojotoolstore 0.3 % /usr/bin/php-cgi
7484 bojotoolstore 0.3 % /usr/bin/php-cgi
15398 drivingroads 0.3 % /usr/bin/php-cgi
18444 bojotoolstore 0.2 % /usr/bin/php-cgi
22486 apache 0.2 % /usr/sbin/httpd
78 root 0.1 % [kipmi0]
23139 root 0.1 % /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
1 root 0.0 % /sbin/init

Howdy,

Well, there’s a number of PHP related processes there… it’s possible that means one or more of your sites is seeing an influx of traffic.

However, what is the output of these commands:

free -m netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -15

Also, can you run the command “ps auxw”, and attach that output as a text file?

-Eric

Thats the thing, I cannot get onto SSH at the moment, it lets me login but then won’t let me type anything.

It has happened before but i had to restart the server to allow me access again, which would mean i would be running normal processes again for a day or two.

Finally managed to connect

Top:

top - 21:26:57 up 4 days, 21:58, 12 users, load average: 21.79, 20.18, 17.46
Tasks: 256 total, 1 running, 248 sleeping, 0 stopped, 7 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16321220k total, 15633400k used, 687820k free, 390020k buffers
Swap: 2097144k total, 7880k used, 2089264k free, 11586296k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19 root 20 0 0 0 0 D 0.7 0.0 34:16.18 events/0
61 root 39 19 0 0 0 S 0.3 0.0 0:20.72 khugepaged
5119 root 20 0 153m 15m 1668 S 0.3 0.1 0:34.30 lfd
1 root 20 0 19356 1476 1232 S 0.0 0.0 0:00.62 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.05 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:02.98 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.69 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.59 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 0:00.64 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
9 root 20 0 0 0 0 S 0.0 0.0 0:00.58 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:00.38 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 0:00.39 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
13 root 20 0 0 0 0 S 0.0 0.0 0:01.15 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 0:00.35 watchdog/2

And now SSH is frozen again, and I cannot get past successful authentication

In your latest “top” output, there seem to be no processes using any considerable CPU power, yet your system load is excessively high. This could indicate that the system is waiting a great deal for other resources (RAM, HDD, network) to become available. Might indicate an overload there or hardware issues.

Also I noticed 12 users logged on, and 7 zombie processes. Those might be hanging sessions of your failed attempts to log on via SSH, but you might want to check those out, using the commands “w” and “last”.

I also recommend the tool “atop” over “top”, since it displays more information like disk, memory, swap and network usage, and records historical data, for later review. atop shows zombie processes with a “Z” in the state column.

You might have to hard-reboot the server if you can’t reliably get in via SSH anymore. A system load of 20 will most likely prevent you from doing any serious work on the server.

When you can get in again, you might want to review the system and kernel logs, and install atop.

Right I have had to restart the server, as last night it got up to 40.1 CPU average. After restarting this morning I am able to get back into SSH

Output from atop
atop

ATOP - JSServer01 2014/08/30 11:05:26 --------- 10s elapsed
PRC | sys 0.14s | user 1.49s | #proc 182 | #zombie 0 | #exit 5 |
CPU | sys 2% | user 15% | irq 0% | idle 378% | wait 5% |
cpu | sys 1% | user 11% | irq 0% | idle 83% | cpu000 w 5% |
cpu | sys 0% | user 4% | irq 0% | idle 96% | cpu002 w 0% |
cpu | sys 0% | user 0% | irq 0% | idle 99% | cpu001 w 0% |
cpu | sys 0% | user 0% | irq 0% | idle 100% | cpu003 w 0% |
CPL | avg1 0.17 | avg5 0.39 | avg15 0.36 | csw 5269 | intr 2754 |
MEM | tot 15.6G | free 12.7G | cache 811.7M | buff 86.2M | slab 353.2M |
SWP | tot 2.0G | free 2.0G | | vmcom 2.7G | vmlim 9.8G |
LVM | Group00-root | busy 5% | read 10 | write 192 | avio 2.62 ms |
DSK | sda | busy 5% | read 10 | write 71 | avio 6.53 ms |
NET | transport | tcpi 38 | tcpo 37 | udpi 0 | udpo 0 |
NET | network | ipi 47 | ipo 37 | ipfrw 0 | deliv 38 |
NET | em1 0% | pcki 66 | pcko 37 | si 4 Kbps | so 24 Kbps |
NET | lo ---- | pcki 10 | pcko 10 | si 0 Kbps | so 0 Kbps |

PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPU CMD 1/5
2168 0.02s 0.82s 0K 0K 0K 8K – - S 8% php-cgi
2383 0.01s 0.30s 0K 0K 0K 0K – - S 3% php-cgi
1866 0.03s 0.27s 0K 0K 36K 100K – - S 3% mysqld
2224 0.01s 0.04s 75780K 20K 48K 88K – - S 1% httpd
4131 0.01s 0.04s 0K 0K - - NE 0 E 1%
78 0.03s 0.00s 0K 0K 0K 0K – - S 0% kipmi0

It is showing normal usage now, so not sure what the hell is going on after a day or two.

Installing atop i did get a warning
There are unfinished transactions remaining. You might consider running yum-complete-transaction first to finish them.

So I ran that too, and it looks as if I cannot install what is required
yum-complete-transaction
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile

  • base: mirrors.melbourne.co.uk
  • epel: mirror.bytemark.co.uk
  • extras: mirror.bytemark.co.uk
  • updates: mirrors.ukfast.co.uk
    Checking for new repos for mirrors
    There are 1 outstanding transactions to complete. Finishing the most recent one
    The remaining transaction had 10 elements left to run
    –> Running transaction check
    —> Package automake.noarch 0:1.11.1-4.el6 will be installed
    —> Package cloog-ppl.x86_64 0:0.15.7-1.2.el6 will be installed
    —> Package cpp.x86_64 0:4.4.7-4.el6 will be installed
    —> Package gcc.x86_64 0:4.4.7-4.el6 will be installed
    —> Package gcc-c++.x86_64 0:4.4.7-4.el6 will be installed
    —> Package libgomp.x86_64 0:4.4.7-4.el6 will be installed
    —> Package libstdc+±devel.x86_64 0:4.4.7-4.el6 will be installed
    —> Package mpfr.x86_64 0:2.4.1-6.el6 will be installed
    —> Package php-devel.x86_64 0:5.3.3-27.el6_5 will be installed
    –> Processing Dependency: php(x86-64) = 5.3.3-27.el6_5 for package: php-devel-5.3.3-27.el6_5.x86_64
    —> Package ppl.x86_64 0:0.10.2-11.el6 will be installed
    –> Finished Dependency Resolution
    Error: Package: php-devel-5.3.3-27.el6_5.x86_64 (updates)
    Requires: php(x86-64) = 5.3.3-27.el6_5
    Installed: php-5.3.3-27.el6_5.1.x86_64 (@updates)
    php(x86-64) = 5.3.3-27.el6_5.1
    Available: php-5.3.3-26.el6.x86_64 (base)
    php(x86-64) = 5.3.3-26.el6
    Available: php-5.3.3-27.el6_5.x86_64 (updates)
    php(x86-64) = 5.3.3-27.el6_5
    You could try using --skip-broken to work around the problem
    You could try running: rpm -Va --nofiles --nodigest

Running free-m now (kinda pointless as it is back to normal now)

free -m
total used free shared buffers cached
Mem: 15938 2921 13017 0 88 816
-/+ buffers/cache: 2016 13922
Swap: 2047 0 2047M

And the netstat

netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -15 19 4 127.0.0.1 2 81.156.223.142 1 servers) 1 Address 1 90.206.201.8

Hmm I think i may have found the issue

I seem to have thousands of these in the messages log

Aug 30 05:05:14 JSServer01 named[29765]: client 127.0.0.1#45585: query (cache) ‘131.205.13.211.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:15 JSServer01 named[29765]: client 127.0.0.1#43407: query (cache) ‘29.193.26.103.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:15 JSServer01 named[29765]: client 127.0.0.1#41691: query (cache) ‘241.150.174.195.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:15 JSServer01 named[29765]: client 127.0.0.1#37403: query (cache) ‘166.109.97.211.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:15 JSServer01 named[29765]: client 127.0.0.1#58532: query (cache) ‘241.150.174.195.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#44044: query (cache) ‘102.120.149.107.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#37691: query (cache) ‘91.34.135.174.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#57784: query (cache) ‘219.106.153.184.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#40505: query (cache) ‘204.5.106.41.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#35974: query (cache) ‘91.34.135.174.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#35621: query (cache) ‘53.79.234.212.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#44718: query (cache) ‘102.120.149.107.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#52370: query (cache) ‘53.79.234.212.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:17 JSServer01 named[29765]: client 127.0.0.1#42438: query (cache) ‘177.10.244.162.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:17 JSServer01 named[29765]: client 127.0.0.1#41674: query (cache) ‘202.209.241.61.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:18 JSServer01 named[29765]: client 127.0.0.1#56260: query (cache) ‘124.10.244.162.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:19 JSServer01 named[29765]: client 127.0.0.1#48054: query (cache) ‘166.109.97.211.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:22 JSServer01 named[29765]: client 127.0.0.1#49980: query (cache) ‘188.17.82.36.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#49930: query (cache) ‘204.5.106.41.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#57424: query (cache) ‘188.17.82.36.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#57964: query (cache) ‘120.107.255.193.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#35676: query (cache) ‘124.10.244.162.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#35009: query (cache) ‘101.95.101.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:24 JSServer01 named[29765]: client 127.0.0.1#47569: query (cache) ‘120.107.255.193.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:24 JSServer01 named[29765]: client 127.0.0.1#39782: query (cache) ‘227.58.73.203.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:24 JSServer01 named[29765]: client 127.0.0.1#50507: query (cache) ‘101.95.101.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:24 JSServer01 named[29765]: client 127.0.0.1#41356: query (cache) ‘156.12.244.162.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:25 JSServer01 named[29765]: client 127.0.0.1#43907: query (cache) ‘227.58.73.203.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:25 JSServer01 named[29765]: client 127.0.0.1#50367: query (cache) ‘179.107.160.163.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:25 JSServer01 named[29765]: client 127.0.0.1#58792: query (cache) ‘179.107.160.163.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:25 JSServer01 named[29765]: client 127.0.0.1#45449: query (cache) ‘182.233.15.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#35984: query (cache) ‘19.96.95.23.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#42738: query (cache) ‘19.96.95.23.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#57701: query (cache) ‘187.92.95.23.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#33209: query (cache) ‘77.113.182.192.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#51364: query (cache) ‘240.9.244.162.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#56060: query (cache) ‘240.9.244.162.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:27 JSServer01 named[29765]: client 127.0.0.1#54580: query (cache) ‘238.210.34.89.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:27 JSServer01 named[29765]: client 127.0.0.1#34927: query (cache) ‘187.92.95.23.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:27 JSServer01 named[29765]: client 127.0.0.1#54763: query (cache) ‘170.233.15.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:28 JSServer01 named[29765]: client 127.0.0.1#51508: query (cache) ‘170.233.15.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:28 JSServer01 named[29765]: client 127.0.0.1#34891: query (cache) ‘77.113.182.192.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:29 JSServer01 named[29765]: client 127.0.0.1#37835: query (cache) ‘181.233.15.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:29 JSServer01 named[29765]: client 127.0.0.1#47091: query (cache) ‘156.12.244.162.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:31 JSServer01 named[29765]: client 127.0.0.1#47907: query (cache) ‘167.13.244.162.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:31 JSServer01 named[29765]: client 127.0.0.1#42951: query (cache) ‘167.13.244.162.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:31 JSServer01 named[29765]: client 127.0.0.1#37369: query (cache) ‘223.59.200.220.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#54876: query (cache) ‘187.233.15.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#56875: query (cache) ‘187.233.15.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#56911: query (cache) ‘182.233.15.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#37661: query (cache) ‘171.233.15.199.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#35656: query (cache) ‘220.59.200.220.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#42569: query (cache) ‘33.114.193.123.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:33 JSServer01 named[29765]: client 127.0.0.1#40194: query (cache) ‘33.114.193.123.in-addr.arpa/PTR/IN’ denied
Aug 30 05:05:33 JSServer01 named[29765]: client 127.0.0.1#43916: query (cache) ‘181.233.15.199.in-addr.arpa/PTR/IN’ denied

Okay, Eric might be able to say more about the error you get when trying to finish package updates; I’m not familiar enough with CentOS (I’m assuming you’re using that, or another distro that uses “yum”).

Did this issue start just after you installed updates? Or did it happen before that?

Note that the 40 is not the CPU usage, but system load. CPU usage is usually expressed in form of a percentage that the CPU spends handling processes. In your case, that’d be a maximum of 400% or 100% for each core.

System load on the other hand basically tells you how many processes on the average are ready to execute per time unit (usually 1 minute, 5 minutes, 15 minutes). In addition to CPU, this also takes other required resources into account, e.g. when a process has to wait for HDD availability. With your 4-core CPU, a load of up to 4 is acceptable and “normal” if the system is very heavily used.

So a load of 40 means that 40 processes are ready to do something but can’t, because resources are lacking. It’s to be expected that the system is nearly unresponsive then. In your case, that’s probably not CPU power (since your top output showed that the CPU was mostly idle), but something else.

A good candidate is the HDD, in case there’s hardware trouble with it. What kind of HDD setup do you have in the server? Single disk? Software/hardware RAID? You might want to use the command smartctl to review the HDDs’ status values.

Since this only happens after a while, you might want to observe it for a bit and note if the system load goes up. You can review historical atop data by running atop -r /var/log/atop.log. When the load goes up, note if the disk is overloaded (“DSK % busy” is a good indicator), also check which processes use what amount of memory, disk, network etc. You can sort the output of atop accordingly and switch to different screens. Press “?” for a help screen.

Also don’t forget to check last to see what those 12 logins were during your last problem phase! It shows you all logins with username and IP address. Pay attention to any entries with unexpected users/IP addresses there!

I checked the last login’s and i can confirm they are all mine.

It also looks like my server may have been in a ddos attack maybe?

I am seeing a lot of these in the messages log

Aug 29 19:51:52 JSServer01 named[29765]: client 127.0.0.1#11277: query (cache) ‘gmx.net/NS/IN’ denied
Aug 29 19:51:52 JSServer01 named[29765]: client 127.0.0.1#11277: query (cache) ‘cingular.com/NS/IN’ denied
Aug 29 19:51:52 JSServer01 named[29765]: client 127.0.0.1#11277: query (cache) ‘sourceforge.net/NS/IN’ denied
Aug 29 19:50:18 JSServer01 named[29765]: client 127.0.0.1#52864: query (cache) ‘intel.com/NS/IN’ denied
Aug 29 19:50:18 JSServer01 named[29765]: client 127.0.0.1#52864: query (cache) ‘msn.com/NS/IN’ denied
Aug 29 19:50:18 JSServer01 named[29765]: client 127.0.0.1#52864: query (cache) ‘comcast.net/NS/IN’ denied

And then what looks like a dos attack?

Aug 30 01:11:41 JSServer01 kernel: Firewall: *TCP_OUT Blocked* IN= OUT=em1 SRC=149.255.100.109 DST=69.46.36.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=25880 DF PROTO=TCP SPT=50786 DPT=9050 WINDOW=14600 RES=0x00 SYN URGP=0 UID=508 GID=503 Aug 30 01:11:41 JSServer01 named[29765]: client 127.0.0.1#44437: query (cache) '187.88.217.189.in-addr.arpa/PTR/IN' denied Aug 30 01:11:41 JSServer01 named[29765]: client 127.0.0.1#46883: query (cache) '187.88.217.189.in-addr.arpa/PTR/IN' denied Aug 30 01:11:41 JSServer01 named[29765]: client 127.0.0.1#53390: query (cache) '225.222.197.69.in-addr.arpa/PTR/IN' denied Aug 30 01:11:42 JSServer01 named[29765]: client 127.0.0.1#38526: query (cache) '252.55.186.210.in-addr.arpa/PTR/IN' denied Aug 30 01:11:42 JSServer01 kernel: Firewall: *TCP_OUT Blocked* IN= OUT=em1 SRC=149.255.100.109 DST=69.46.36.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=25881 DF PROTO=TCP SPT=50786 DPT=9050 WINDOW=14600 RES=0x00 SYN URGP=0 UID=508 GID=503 Aug 30 01:11:42 JSServer01 named[29765]: client 127.0.0.1#56360: query (cache) '94.158.55.50.in-addr.arpa/PTR/IN' denied Aug 30 01:11:42 JSServer01 named[29765]: client 127.0.0.1#33568: query (cache) '34.137.46.77.in-addr.arpa/PTR/IN' denied Aug 30 01:11:43 JSServer01 named[29765]: client 127.0.0.1#55732: query (cache) '190.243.45.70.in-addr.arpa/PTR/IN' denied Aug 30 01:11:43 JSServer01 named[29765]: client 127.0.0.1#57461: query (cache) '120.141.93.216.in-addr.arpa/PTR/IN' denied

Locutus, I run updates all the time to keep the server updated, but around a week ago there was quite a few updates which i ran, and then I enabled graylisting as i was starting to see a lot of spam emails coming through.

After that, I then started to get CSF alerts of high load averages, and then it seemed to get worse.

I am running a Dell Poweredge R210, which comes with a Dell Raid Card, and two 1TB hard drives set up in RAID 1

In virtualmin, it only shows the raid (SCSI device A Drive size 953.31 GB - Make and model Dell VIRTUAL DISK)

I have another machine which is running quite happily without the same issues, but that is running a software raid across two disks and I am able to query the raid / disks, but with this machine, I’ve never been able to query the raid, as I don’t think there are any proper Dell drivers for the raid card to run Linux.

The raid card is a Dell SAS 6/iR Adapter

Hi Guys,

It started building up again, ran atop -r and this is the output

ATOP - JSServer01 2014/08/30 15:02:04 --------- 4h25m53s elapsed
PRC | sys 94.89s | user 19m30s | #proc 184 | #zombie 0 | #exit 0 |
CPU | sys 1% | user 19% | irq 0% | idle 371% | wait 9% |
cpu | sys 1% | user 9% | irq 0% | idle 82% | cpu000 w 8% |
cpu | sys 0% | user 5% | irq 0% | idle 94% | cpu002 w 1% |
cpu | sys 0% | user 3% | irq 0% | idle 97% | cpu001 w 0% |
cpu | sys 0% | user 2% | irq 0% | idle 98% | cpu003 w 0% |
CPL | avg1 0.27 | avg5 0.29 | avg15 0.27 | csw 5643189 | intr 6191011 |
MEM | tot 15.6G | free 11.8G | cache 1.4G | buff 232.5M | slab 406.8M |
SWP | tot 2.0G | free 2.0G | | vmcom 2.8G | vmlim 9.8G |
LVM | Group00-root | busy 10% | read 158419 | write 785040 | avio 1.76 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | avio 2.57 ms |
DSK | sda | busy 10% | read 112136 | write 262769 | avio 4.43 ms |
NET | transport | tcpi 534967 | tcpo 484902 | udpi 13309 | udpo 13651 |
NET | network | ipi 555500 | ipo 516192 | ipfrw 0 | deliv 548501 |
NET | em1 0% | pcki 492572 | pcko 649938 | si 36 Kbps | so 409 Kbps |
NET | lo ---- | pcki 101110 | pcko 101110 | si 13 Kbps | so 13 Kbps |
Window has been resized…
PID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/17
11352 1 6.33s 4m13s 310.1M 102.7M 324K 10720K N- - S 0 2% php-cgi
11353 1 5.98s 3m59s 310.1M 102.7M 124K 12436K N- - S 2 2% php-cgi
14890 1 7.86s 3m47s 286.6M 81180K 0K 11336K N- - S 1 1% php-cgi
1866 16 20.63s 1m45s 863.0M 63104K 81144K 1.0G N- - S 3 1% mysqld
6279 1 4.30s 79.64s 311.8M 104.1M 2692K 70844K N- - D 2 1% php-cgi
6992 1 2.12s 64.37s 278.9M 77108K 164K 4K N- - S 0 0% php-cgi
10698 1 2.92s 57.18s 301.1M 95656K 572K 52416K N- - S 1 0% php-cgi
6242 1 1.36s 39.79s 285.7M 78572K 80K 4K N- - S 0 0% php-cgi
6993 1 1.10s 33.55s 272.9M 66768K 220K 4K N- - S 2 0% php-cgi
78 1 21.30s 0.00s 0K 0K 0K 0K N- - S 3 0% kipmi0
6600 1 0.51s 17.45s 264.5M 63392K 176K 164K N- - S 0 0% php-cgi

I think I have figured it out - It’s something to do with BIND - I think i’ve been going through DDOS attacks for some strange reason

I have just added this into named.conf

acl "trusted"{ My server ip address My server ip address 2 My secondary DNS server IP address localhost; localnets; };

options {
listen-on port 53 {
any;
};
listen-on-v6 port 53 {
any;
};
directory “/var/named”;
dump-file “/var/named/data/cache_dump.db”;
statistics-file “/var/named/data/named_stats.txt”;
memstatistics-file “/var/named/data/named_mem_stats.txt”;
allow-query { trusted; };
allow-transfer { trusted; };
allow-recursion { trusted;} ;
allow-query-cache { trusted; };
recursion no;

dnssec-enable yes;
dnssec-validation yes;
dnssec-lookaside auto;

/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";

managed-keys-directory "/var/named/dynamic";
also-notify {
	};

};

I now see a lot of these type of warnings in my log file

Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.198.26#53: query ‘dansimpson.net/SPF/IN’ denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.198.26#53: query ‘dansimpson.net/SPF/IN’ denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.198.26#53: query ‘ns2.j5huh.net/A/IN’ denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.198.26#53: query ‘ns1.j5huh.net/A/IN’ denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.192.25#21267: query ‘ns1.j5huh.com/A/IN’ denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.192.25#20384: query ‘ns1.j5huh.com/A/IN’ denied

Which I am assuming is remains of a DNS attack?

Well adding those DNS settings broke my websites, as I couldn’t access them, although I have upped the firewall to block multiple queries which seems to have worked,

Does this give any clues? LVM and DSK are flashing red?

ATOP - JSServer01 2014/09/01 13:08:44 --------- 2m54s elapsed
PRC | sys 5.84s | user 2.64s | #proc 138 | #zombie 0 | #exit 0 |
CPU | sys 8% | user 7% | irq 0% | idle 307% | wait 78% |
cpu | sys 4% | user 2% | irq 0% | idle 25% | cpu000 w 69% |
cpu | sys 2% | user 4% | irq 0% | idle 88% | cpu001 w 5% |
cpu | sys 1% | user 1% | irq 0% | idle 96% | cpu002 w 2% |
cpu | sys 0% | user 0% | irq 0% | idle 97% | cpu003 w 2% |
CPL | avg1 1.38 | avg5 0.58 | avg15 0.21 | csw 248036 | intr 226145 |
MEM | tot 15.6G | free 14.2G | cache 501.0M | buff 14.9M | slab 334.3M |
SWP | tot 2.0G | free 2.0G | | vmcom 868.7M | vmlim 9.8G |
LVM | Group00-root | busy 78% | read 109666 | write 2872 | avio 1.21 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | avio 1.09 ms |
DSK | sda | busy 79% | read 65008 | write 1376 | avio 2.07 ms |
NET | transport | tcpi 24 | tcpo 24 | udpi 75 | udpo 102 |
NET | network | ipi 120 | ipo 135 | ipfrw 0 | deliv 102 |
NET | em1 0% | pcki 182 | pcko 85 | si 0 Kbps | so 0 Kbps |
NET | lo ---- | pcki 33 | pcko 33 | si 0 Kbps | so 0 Kbps |
*** system and process activity since boot ***
PID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/16
158 1 4.22s 0.98s 36096K 1368K 276K 16K N- - S 1 3% plymouthd
2158 1 0.04s 1.33s 239.1M 52280K 2804K 4K N- - S 0 1% spamd
1 1 0.56s 0.02s 19356K 1524K 409.7M 6968K N- - S 0 0% init
34 1 0.54s 0.00s 0K 0K 0K 0K N- - S 0 0% kblockd/0
78 1 0.32s 0.00s 0K 0K 0K 0K N- - S 3 0% kipmi0
437 1 0.01s 0.15s 10648K 756K 9268K 0K N- - S 2 0% udevd
2182 1 0.04s 0.01s 154.2M 13520K 11332K 7712K N- - S 3 0% postgrey
1843 2 0.01s 0.04s 37812K 4184K 1556K 4K N- - S 0 0% hald
2260 1 0.00s 0.04s 81296K 3408K 520K 8K N- - S 3 0% master

And this was from yesterday, when it started to build up again
ATOP - JSServer01 2014/08/31 00:00:01 --------- 6h17m12s elapsed
PRC | sys 3m58s | user 25m22s | #proc 201 | #zombie 0 | #exit 1 |
CPU | sys 2% | user 15% | irq 0% | idle 374% | wait 9% |
cpu | sys 0% | user 8% | irq 0% | idle 84% | cpu000 w 8% |
cpu | sys 0% | user 4% | irq 0% | idle 95% | cpu002 w 1% |
cpu | sys 0% | user 2% | irq 0% | idle 97% | cpu001 w 0% |
cpu | sys 0% | user 2% | irq 0% | idle 98% | cpu003 w 0% |
CPL | avg1 0.13 | avg5 0.16 | avg15 0.14 | csw 9523149 | intr 8306300 |
MEM | tot 15.6G | free 12.0G | cache 1.2G | buff 255.2M | slab 192.2M |
SWP | tot 2.0G | free 2.0G | | vmcom 3.1G | vmlim 9.8G |
LVM | Group00-root | busy 10% | read 158124 | write 917942 | avio 2.18 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | avio 0.88 ms |
DSK | sda | busy 10% | read 119043 | write 345707 | avio 5.04 ms |
NET | transport | tcpi 539048 | tcpo 506361 | udpi 43734 | udpo 44075 |
NET | network | ipi 598771 | ipo 564033 | ipfrw 0 | deliv 583078 |
NET | em1 0% | pcki 514411 | pcko 678076 | si 19 Kbps | so 301 Kbps |
NET | lo ---- | pcki 131997 | pcko 131997 | si 13 Kbps | so 13 Kbps |
*** system and process activity since boot ***
PID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/23
12952 1 9.41s 6m04s 303.8M 97396K 296K 18116K N- - S 0 2% php-cgi
12239 1 8.58s 5m44s 292.0M 86864K 684K 16916K N- - S 0 2% php-cgi
13618 1 7.58s 5m12s 310.1M 102.7M 32K 13992K N- - S 0 1% php-cgi
1772 15 26.10s 2m08s 798.8M 66204K 214.2M 1.2G N- - S 0 1% mysqld
78 1 2m13s 0.00s 0K 0K 0K 0K N- - S 3 1% kipmi0
6474 1 3.62s 95.06s 286.2M 78744K 53280K 12952K N- - S 0 0% php-cgi
3119 1 3.16s 84.38s 287.4M 80580K 105.0M 8660K N- - S 0 0% php-cgi
2571 33 4.86s 42.56s 2.6G 181.8M 155.9M 13296K N- - S 1 0% dsm_om_connsvc
20531 1 2.01s 27.72s 275.4M 69604K 476K 47256K N- - S 0 0% php-cgi