Server up but Virtualmin won't stay up

itmustbe · December 26, 2013, 6:07pm

I haven’t made any modifications to my server in months. Today I tried to go to Virtualmin but my browser said it couldn’t establish a connection on that port. I ssh’d in and everything seems normal when I run top (though I’m no expert, but no unusual memory usage). When I try running “/etc/init.d/webmin start” the Virtualmin page pulls up for a moment in my browser. As soon as I try to login the connection is reset in my browser. There are no errors on the command line, unless I try to restart Webmin, where it lets me know it’s not running before starting it. What should I try to keep it up and staying up like usual? I thought about rebooting the server but I don’t want to have any issues on reboot that would cause other services like email and websites to go down (if my Webmin install is in need of help).

Locutus · December 26, 2013, 6:17pm

Might be a memory or resource issue, killing the Webmin process. You might want to check your syslog for respective messages, also /etc/webmin/miniserv.log might contain useful information.

Check free for available memory. If you’re on a OpenVZ VPS, check /proc/user_beancounters.

itmustbe · December 26, 2013, 7:00pm

It would appear email spam is up quite a bit today too, perhaps coincidentally?

@Locutus There is no miniserv.log to be found in that location, and I am not on OpenVZ VPS

Locutus · December 26, 2013, 11:07pm

I’m sorry, stupid me. The log is at /var/webmin/miniserv.log of course.

The increased spam mails can possibly cause more resources to be used, if you’re e.g. using SpamAssassin in standalone mode (which will spawn an SA process for each mail).

Please check the other things I mentioned (syslog, free). You can also use the tool atop which records historical performance data like which process uses how much memory/CPU etc., to find potential resource leaks.

itmustbe · December 27, 2013, 12:56am

@Locutus I think we may be off on the wrong track suspecting memory usage, as running “top” shows nothing unusual, and I checked my host’s admin panel (at Media Temple) and looked at my VPS memory and CPU usage live, today, over the week, and over the month, and there are no spikes at all. The server and all services appear to be running normally other than the fact Virtualmin – out of nowhere – no longer pulls up (not for more than a moment, anyway!) One user of three (this is a small server!) reported a surge of spam emails, but the other accounts have been fine (all use SSL and very strong passwords).

itmustbe · December 27, 2013, 2:03am

@Locutus I have found the miniserv.log but there are no unusual or very recent entries. I can’t recall what mode in which I have SpamAssassin running… it’s the mode that takes the least virtual server memory. To check “free” do I just type “free”, like running “top”? And can you point me in the direction of my syslog (I’m accustomed to using Virtualmin and Webmin for most tasks, although I can at least ssh and su in as root, but I’m a bit out of my depth after that). The logwatch (which I have set to detailed reporting) this morning showed nothing unusual other than 59 emails delivered to “root”, which usually receives no emails unless something is wrong. Unfortunately, I only know how to check those through Webmin (via the Read User Mail server), and of course I still can’t pull up Webmin I was tempted earlier to reboot the server via Media Temple’s VPS control panel, but I don’t want to have any Webmin-related bootup issues that take existing services (email/websites) offline.

itmustbe · December 27, 2013, 2:24am

Ah, something new in my troubleshooting! Spam Assassin is not running at all, I can see in my email headers that it has stopped along with Webmin/Virtualmin. So that explains the sudden influx of spam to one email account. I know “/etc/init.d/webmin start” works to try to start webmin (though it is not working in my case, or at least, webmin isn’t staying up more than a split second or two). I’m not sure what command to use to try to restart Spam Assassin though (even after some research on the web)? I believe it is operating as a separate process, maybe spamc or spamd? Of course the main issue is still the lack of Webmin/Virtualmin all of a sudden, but Spam Assassin is a pretty important process too, and it’s odd they’re both down together, while email and sites continue to run as normal…

Locutus · December 27, 2013, 10:49am

There might be resource issues even though you don’t see them in free, i.e. if processes have already been killed. But that’s just a guess of course. It’s certainly not normal that Webmin and SpamAssassin simply stop running.

The syslog is usually located in /var/log/syslog or /var/log/messages, depending on your distribution. Check that first and look for crash or OOM messages.

Also check those 59 emails sent to root. They should be located in /root/Maildir/new or /root/Maildir/cur.

itmustbe · December 27, 2013, 1:55pm

I’ll check the root emails in just a sec, and the logs, but quickly, the Logwatch this morning looked a lot stranger, with ClamAV in trouble:


--------------------- clam-update Begin ------------------------ 
The ClamAV update process was started 1 time(s)
Last ClamAV update process started at Thu Dec 26 03:31:06 2013
Last Status:

main.cvd is up to date (version: 55, sigs: 2424225, f-level: 60, builder: neo)

Downloading daily-18284.cdiff [100%]

Downloading daily-18285.cdiff [100%]

Downloading daily-18286.cdiff [100%]

Downloading daily-18287.cdiff [100%]

WARNING: [LibClamAV] mpool_malloc(): Can’t allocate memory (262144 bytes).

WARNING: [LibClamAV] cli_mpool_strdup(): Can’t allocate memory (24 bytes).

WARNING: [LibClamAV] cli_loadhash: Problem parsing database at line 52176

WARNING: [LibClamAV] Can’t load daily.mdb: Malformed database

WARNING: [LibClamAV] cli_tgzload: Can’t load daily.mdb

WARNING: [LibClamAV] Can’t load /var/lib/clamav/clamav-ba720c437667db49a41f36fdea54b7d8.tmp/clamav-00990949e1715fc6913f79e25927592d.cld: Malformed database

ERROR: Failed to load new database: Malformed database

ERROR: During database load : ERROR: Failed to load new database: Malformed database

WARNING: Database load exited with status 55

ERROR: Failed to load new database
The following ERRORS and/or WARNINGS were detected when

running the ClamAV update process.  If these ERRORS and/or

WARNINGS do not show up in the “Last Status” section above,

then their underlying cause has probably been corrected.
ERRORS:

During database load : ERROR: Failed to load new database: Malformed database: 1 Time(s)

Failed to load new database: 1 Time(s)

Failed to load new database: Malformed database: 1 Time(s)
WARNINGS:

[LibClamAV] Can’t load /var/lib/clamav/clamav-ba720c437667db49a41f36fdea54b7d8.tmp/clamav-00990949e1715fc6913f79e25927592d.cld: Malformed database: 1 Time(s)

[LibClamAV] cli_mpool_strdup(): Can’t allocate memory (24 bytes).: 1 Time(s)

[LibClamAV] mpool_malloc(): Can’t allocate memory (262144 bytes).: 1 Time(s)

[LibClamAV] cli_tgzload: Can’t load daily.mdb: 1 Time(s)

[LibClamAV] cli_loadhash: Problem parsing database at line 52176: 1 Time(s)

Database load exited with status 55: 1 Time(s)

[LibClamAV] Can’t load daily.mdb: Malformed database: 1 Time(s)
---------------------- clam-update End -------------------------
--------------------- Clamav Begin ------------------------
Daemon check list:

Database status OK: 144 Time(s)
---------------------- Clamav End -------------------------

itmustbe · December 27, 2013, 2:06pm

By the way, I’m running the latest Cent OS 6.

So that bit I just pasted above about ClamAV I also found in my system log, but today’s entries show ClamAV is fine again, and loaded its database ok. I see nothing else in /var/log/messages other than these ClamAV entries over the last week, all of which looked normal except the one pasted above (the second-to-last one in the main system log for Dec. 26). Oddly, I’m only seeing ClamAV entries in this system messages log, at least over the last week, but perhaps that’s normal.

I will look at those root emails next, as I appeared to get 124 of them overnight in addition to the other 59!

itmustbe · December 27, 2013, 2:17pm

So the first system message:

postfix::is_postfix_running failed : Failed to query Postfix config command to get the current value of parameter process_id_directory: at …/web-lib-funcs.pl line 1376.

Actually it’s looking like all 59 + 124 messages are along those lines, though I’m just checking a few randomly right now.

Should I try just rebooting the server via Media Temple’s control panel? It’s been running well for awhile now (it’d actually been 90 days of uptime or so when I last looked at it the the other week and ran some backups for the end of the month… I’ve been running Virtualmin/Webmin very happily for over a year now, with the server updating itself, and it gets very little usage, just a couple up-to-date Wordpress sites, some static sites, and a bit of email). I only hesitate to reboot as I can at least SSH in right now, and I’d hate for something to go terribly wrong, and wish I had spent more time troubleshooting while I still had a way in!

itmustbe · December 27, 2013, 2:19pm

I just typed “free” as well (and all those root emails do seem to be about Postfix):


                total           used          free        shared    buffers   cached
Mem:       3774872     930212    2844660          0          0     318108
-/+ buffers/cache:     612104    3162768
Swap:            0               0               0

Locutus · December 27, 2013, 3:08pm

Please enclose all screen listings in [code][/code] tags, otherwise monospace font and linebreaks are lost, making it unreadable.

Locutus · December 27, 2013, 3:13pm

The memory errors you receive from ClamAV are odd, considering your “free” shows enough memory. It might be a hardware issue of your server. Is it a physical or virtual machine?

You can try rebooting it. Using atop you can record historical memory usage data, to see if at the time of problems occurring there’s a memory issue.

About the Postfix error, Eric or someone else from the Virtualmin team would have to say something. You can try running postconf (if that’s the “Postfix config command” they’re talking about) and see if it works.

itmustbe · December 27, 2013, 3:35pm

The atop command doesn’t appear to be installed on my system.

It is a virtual machine on Media Temple’s VPS service. I shudder to say that’s it’s inside a Plesk/Parallels virtual container of some sort (I’m a refugee from Plesk’s control panel!)

Running postconf appears to work fine, I get a whole bunch of output in my terminal.

I thought those memory errors odd too, though they cleared up over the day as this morning ClamAV had no such trouble. Still no Spam Assassin or Webmin/Virtualmin running though! I’m a little concerned still about rebooting in case I have more trouble. Should I wait to hear from Eric on this forum before rebooting?

itmustbe · December 27, 2013, 3:36pm

Sorry I just saw your note on enclosing tags with code, I knew I was doing something wrong there, I’ll do that with any future lines of code to make them more readable!

Locutus · December 27, 2013, 3:54pm

Eric might be able to say more, yeah, since I’m not familiar with CentOS or Plesk. He also has more experience with (resource) issues on several virtual machine hosters.

itmustbe · December 27, 2013, 4:17pm

I will await Eric’s feedback here then… just in case there’s something we’re missing to check before rebooting. Perhaps rebooting will cure everything magically, but in my experience (mostly with Plesk long ago!) rebooting while other things are going wrong is not always wise, as one can lose one’s access to the server, and it seems with Linux that most ailments can be cured over SSH and without a reboot.

The server throughout this period has been performing quite normally, I should add… no slowdown that typically accompanies memory issues. Just a lack of Virtualmin/Webmin and SpamAssassin these last few days, which is rather worrying of course, but you’d never know it from accessing the mailserver and websites.

Eric · December 27, 2013, 9:19pm

Do you have a /proc/user_beancounters file? If so, could you post it’s contents?

-Eric

itmustbe · December 28, 2013, 2:57pm

Here is the contents of the /proc/user_beancounters file:

Version: 2.5
       uid  resource                     held              maxheld              barrier                limit              failcnt
    90999:  kmemsize                 76751205             78077952            228000000            240000000                    0
            lockedpages                     0                    0                 1200                 1200                    0
            privvmpages                924582               926555               896536               943718               892528
            shmpages                     1206                 1206                90000                90000                    4
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            numproc                        92                  120                  600                  600                    0
            physpages                  214748               218221               896536               943718                    0
            vmguarpages                     0                    0               524288           2147483647                    0
            oomguarpages               192788               192788               524288           2147483647                    0
            numtcpsock                     36                   36                 2000                 2000                    0
            numflock                       14                   14                 1000                 1100                    0
            numpty                          1                    1                  100                  100                    0
            numsiginfo                      0                   30                 1024                 1024                    0
            tcpsndbuf                  676192               676192             10000000             20000000                    0
            tcprcvbuf                  589824               589824             10000000             20000000                    0
            othersockbuf               339864               341152              5000000             10000000                    0
            dgramrcvbuf                     0                    0             10000000             10000000                    0
            numothersock                  279                  281                 2000                 2000                    0
            dcachesize               42796557             42918127             57000000             60000000                    0
            numfile                      3948                 4015                40000                40000                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            numiptent                      34                   34                  500                  500                    0