Hey all. I have been running Virtualmin 100% trouble free for about a month now. This is running on a Ubuntu VMWare guest OS with a 2.2ghz CPU and 1 gig (out of 4gigs) of ram set aside for it.
Yesterday I set up 4 jobs. A weekly full backup, a daily incremental backup, a weekly full ftp backup, and a weekly incremental ftp backup. I ran the two full jobs and set the incremental to run at 0 and 2 am. Both jobs ran fine and emailed me the results.
Around 11:30 my whole server has crashed due to an out of memory problem. I have looked at the /var/log/kern.log and I see that the problem started around 8:30 am. This is what some of the log looks like:
What other logs can I look at to try to trace this problem? The only reason I have my eye on the backup process is because it is the only single thing I have changed, and I did it yesterday.
I’m going to leave them enabled for tonight to see if things bomb out tomorrow again.
I’ll set that cron job of ps -auxw for sure, great idea. I know there was some default swap stuff that I set up when I installed Ubuntu server, but I do not remembe the exact size I set up. How can I check that to give you the exact details?
Previously free showed 1 gig of actual memory and almost 1 gig of memory in use. There was about 45 megs of available memory. At first I was alarmed by this, it is what my deleted post above was about. However, I think that the memory must not be accurately reported for virtual machines.
This is free ran on my dedicated server that I am trying to move away from:
total used free shared buffers cached
Mem: 1026528 1006692 19836 0 168212 408104
See how it says 1 gig used and 19 megs free? Obviously that is not being reported correctly since that server has been online and running great for a very long time.
With that said, my vmware server access says that 81megs is being used which seems more likely.
I also increased the guest os from 1 gig of ram to 2 gigs of ram:
total used free shared buffers cached
Mem: 2062920 762672 1300248 0 27064 176292
How can I see my current swap filesystem information?
In the case of the above, it’s showing 1402460 of 1474776 as being used; except, the second line clarifies this by subtracting the buffers and cache from that, which aren’t actually in use by any processes running on the system.
So actually, there’s 920800 of regular RAM available.
The last line is how much swap is available.
If you don’t have a line mentioning swap – can you paste in the contents of your /etc/fstab file?
Things are still up and running. Maybe we will never know or maybe it has to do with me increasing the VM memory to 2 gigs. Currently VMConsole says 122mb is in use.
I’m not exactly sure how to do the calculations on these free numbers to get the real amount.
Nothing abnormal to report in /var/log/syslog.
Nothing abnormal in my ps -auxw.
I have my script running every 15 minutes and spitting the ps -auxw into a folder dated today and pruning files more than 2 days old. I’ll leave it running indefinitely until I am sure this isn’t going to happen again.
Well, it does appear as if you don’t have any swap enabled; so any sort of spike in usage that used up your available RAM could have caused the problem you had.
If you had created a swap partition, I don’t see anything in your fstab that’s enabling it.
I would recommend having swap, as spikes can happen; and it’s probably just a matter of time before your 2GB there runs out
This was set up automatically during the install of Ubuntu server. How do I get Ubuntu to start using this swap space?
Also here is a fdisk -l:
[code]sudo fdisk -l
[sudo] password for pcm2a:
Disk /dev/sda: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0008af06
Device Boot Start End Blocks Id System
/dev/sda1 * 1 10171 81698526 83 Linux
/dev/sda2 10172 10443 2184840 5 Extended
/dev/sda5 10172 10443 2184808+ 82 Linux swap / Solaris
[/code]
Ahh, nice work, that did indeed enable about 2GB of swap.
Should you bump it to 4GB? Well, I’m inclined to think you’re okay with your current setup, but I guess you just need to keep an eye out for how much RAM/swap you’re using.
Thanks for all the help. I went ahead and repartitioned the swap to 4 gigs to make it double the ram amount. At some point will I see the swap memory being used with the “free” program?
Why does it report 1.59gigs of memory used when there is barely anything actively running on the box? Certainly nothing taking up 1.59 gigs of ram. Is this just 1.59 gigs of ram that has been used in the past, but isn’t active right now? VMware reports around 150mb of ram being actively used.
Maybe I’m supposed to be subtracting something from the “buffers” line?
Why does it report 1.59gigs of memory used
It isn’t using it. It is holding it in cache, so the programs that used it is available immediately from the RAM. Other programs can use that RAM if needed.
As long as the RAM is sufficient, it won’t use the swap. It is mainly used as burstable RAM. Temporarily used if 2 GB isn’t enough.
When you have so many things going on, that your box needs to use swap for longer periods of time, then it is wise to add more RAM to the box.
I had a similar issue on a similar configuration around the same time, very weird. My problem was a couple of items. First I only had 512 MB of RAM. Secondly, I found that the hard drives I was running the Virtual Machine off of were creating a serious bottleneck. I would run out of memory, and disk swapping would become so slow that the machine would come to a halt, and sometimes crash.
I had both AWStats and Webalizer on the machine for each domain, It turns out those stats where slowing things down and using up a lot of ram when they run. They were also causing some problems with duplicate emails due to the fact that Postfix would time out and resend the message to Amavisd. By adding more RAM and moving the VM to better performance disk and interface, I was able to solve the issue.
Virtual machines will do best with at least 250 to 300MB/s of read/write throughput. Avoid RAID 5 whenever possible. Use a RAID 1 or 0+1 setup instead.