rendundancy, clustering and failovers

Hi,

I’ve currently got one server running virtualmin pro, and i’d like to get a second box to use as backup mail/dns and possibly a hot spare web server.

Anyone got any advice on how this might work, best configurations, synchronisation etc…

Hey Chris,

The very first Virtualmin installation three years ago was a fully redundant configuration. It wasn’t easy, but it’s gotten easier since then. :wink:

There are at least three orthogonal issues you have to deal with:

[ol]

[]Failure detection and takeover[/]

[]Synchronization of configuration and data[/]

[]Notification of problems[/]

[/ol]

Webmin has help for you on a couple of these, though you need an extra software component.

The first issue is failover. For this, I used Heartbeat, and I have zero complaints about it. It works beautifully, simply, reliably, and it has a nice Webmin module. You can get it here:

http://linux-ha.org/download/index.html

Synchronization of configuration and data can be done in roundabout ways with Webmin…but I think the ideal choice is rsync. You can configure it to run over ssh with a public key on the sending machine, so it can happen via a cronjob. On the system I setup, I ran rsync every hour. This is probably an area we ought to address in Webmin, but it’s a pretty wide-open task and there are dozens of ways to go about it, depending on your focus. It’s also easy to screw things up on the secondary machine–many configuration files are specific to the box, even if the boxes are identically configured and are expected to behave identically after a failover. Network details can’t change, for example. I believe I had two or three subdirectories of /etc that I excluded from my rsync process, and otherwise included all of /etc, all of /home, and the /var/named directory (depending on OS and configuration, this may be in /etc or elsewhere). I’ll think on doing something like this for Webmin. For a very specific purpose (like a Virtualmin hosting server), it’s probably easy enough to select the right bits and then use the remote function API to copy the right stuff over on schedule, possibly using rsync to reduce the size of transmissions (now we’re getting fancy!).

The final bit is notifications, since you need to know about failover events so you can fix whatever went wrong on the primary. I believe the System and Server Status module covers this nicely. Both servers can monitor each other, and send notifications in the event of trouble. Once when I didn’t want to use heartbeat, I used this module all by itself to detect failure and takeover and IP automagically. Worked just fine, but seems like a “when all you’ve got is a hammer” kind of solution. :wink:

This would be a good area for a meta-module (like Virtualmin…stepping out of the traditional role of Webmin modules that address only one service or configuration file and actually perform actions on many modules in order to ease a specific task–more complicated to write and much more specific in function, but more useful in the circumstances for which it was written). Maybe some third party will step up to the plate and tackle this one. It’s a big job, though, and Jamie and I have our work scheduled out for at least two or three months.

Hi Joe,

Seems like a combination of heartbeat and DRBD (http://www.drbd.org/) might be a good combination for creating a failover pair.

I’ll have to find some time myself to setup a couple of test servers.

Hey Chris,

DRBD looks like a great tool. Scary, but really cool.

I’ve not yet needed 1-3 minute resync times, but that’s pretty darned impressive, and I’ll definitely look into it more. It’d be problematic to implement for Webmin/Virtualmin in a generic way, since it seems intimately tied to Linux (so we couldn’t ship it as a standard component for all Virtualmin users)…but it’s darned cool nonetheless.

So, yes, it looks like this might be a nice mechanism for synching your user data, especially if you need that kind of resync time. Careful of your network settings, and any other host-specific files, though, as it seems like it wouldn’t handle exclusions the way rsync does. Be sure to let us know how it turns out!

Chris/Joe,
Any movement on DRDB as a possible solution to mirror two virtualmin servers? Just wondering.

Thanks!

…this thread is pushing 3 years old.

It would be ideal if fail-over and load-balancing could be achieved in the context of Virtualmin. Of course there are manual options as outlined in this post…but it seems virtualmin is very, very close to a clustered, load-balanced, fail-over solution.

A how-to a bit more fleshed out would be totally awesome and raise the bar even higher on an already outstanding piece of software.

I agree – I have one client that asked me the very same thing just the other day.

Yes, with DRBD and Heartbeat you can set-up an HA solutions for Virtualmin, including also MySQL, Mail etc …

And works well!

That sounds great - is there a "howto?" Or perhaps a rough outline?
Thanks!!!

I’m interested in this also. I’ve had a look at the drbd page and it looks like chinese to me.

You can start from here: http://www.linux-ha.org/DRBD

Ciao

I think I know the answer to this already but it might be worth getting an official answer…

If we’re running Virtualmin Pro in such an arrangement, how many server licenses are required?

I talked with Joe - he explained to me that for a “hot swap” the requirement is one license but for multiple servers with fail-over capability AND load-balancing you’d need a valid license for each load-balanced server.

Joe also mentioned that having a "hot-swap" feature was in consideration as a part of the set-up for virtualmin. I am really hoping that this sort of feature is added to virtualmin - as it would make itself all that more attractive to the enterprise market.

Besides - fail-over and load balancing are essentials for high-traffic sites IMHO. There have been several suggested methods in this thread (drdb for one) and different techniques for fail-over (like, installing different parts of the filesystem on NFS and then having that NFS redundant and there being two apache webservers that serve the virtualmin content).

I am not sure best of practice here - hopefully more interest in this kind of feature set will encourage the developers to enhance the already powerful cluster tools to this stage of functionality.

In any case, when I have the time, I plan on experimenting with this:
http://wiki.centos.org/HowTos/Ha-Drbd

and this:
http://howtoforge.com/high_availability_loadbalanced_apache_cluster

But my time resources are very limited…and there are considerations to deal with (like - ensuring that virtualmin is only accessed on the "master" machine(s), there may be issues with DNS/IP, and so on.)

Anyways - hopefully this thread will congeal into some sound solutions.

If Virtualmin Pro had an option to run redundant servers / loadbalancing servers that could take over in case one machine crashed i sure would invest in the needed amount of licenses faster then you could say paypal :wink:

If Virtualmin Pro had an option to run redundant servers / loadbalancing servers that could take over in case one machine crashed i sure would invest in the needed amount of licenses faster then you could say paypal ;)

I’ve talked about this a few times in the forums.

It’s definitely not the fear of not making any money on it that prevents us from tackling it.

The problem is that “load balancing” is meaningless without scaling of the whole stack, and it’s pretty much impossible to scale an application without the application being designed to scale. Hot spare is something that we can provide, and we will, in the not distant future…this is probably not more than two or three months away, actually.

But let’s talk load balancing and scaling, since it does come up so much:

Let’s first note that a simple HTML website doesn’t need load balancing. Apache, without any performance tweaks at all, can serve several hundred HTML requests per second on any modern hardware. Say, 500 requests/second, to be conservative…that’s 43 million requests per day. Google and Yahoo serve more than that…you and I don’t. Since that’s the case, we’re talking about scaling applications, not simple web data.

For example, if we needed to scale Virtualmin.com (which is a Joomla+FlySpray+a few custom components installation) to several machines, we would have many problems to solve, and load balancing would be the easiest of all of them.

We’d have to solve the database problem first, because nothing works without the database. So, we need multiple backend databases with replication. OK, so the simplest solution is one-write/many-read databases. So, now we have to make Joomla (and all of its wildy varying applications) use the database in a way that allows it to work in a one-write/many-read environment. We can inject memcached into the chain as a stop gap, but we still have to make the app know about the new database problems. So, three months worth of code spelunking, and we can make Joomla and its apps aware of multiple databases.

But how would Virtualmin solve that? It simply can’t–no tool can. We can’t modify the 85 applications we install automatically, and we certainly can’t modify your custom applications to work against multiple databases. We can, and already do, manage cluster tables within MySQL–you can already create redundant databases in Virtualmin today. But a lot of the other stuff is going to be specific to your application and your deployment.

So, even if we add a load balancing component, which we might–as I’ve mentioned a few times in the past, my previous company built scalability tools (and some of the Squid module in Webmin was developed for my old company, along with a half dozen other modules related to scalability and fault tolerance)–scalability is a problem that no single product can solve. Your app has to be built from the ground up to scale, or refactored in significant ways in order to shoehorn scalability it.

The links above talk about the simple part–they assume that the backends are identical and static. Throw a real application into the mix (the only thing slow enough to need load balancing) and suddenly things get a lot more complicated.

So, let’s break it into pieces when talking about these components, because they are very different problems…and some are much wider ranging than Virtualmin could address.

And, a hot spare of a bunch of simple websites is definitely something we will add to Virtualmin in the new future.

Just thinking out loud here…

If the only goal is on the apache side of it (web content, not mail, not dns, and not really database) where all you are doing is replicating home directories and databases directly…it seems like you could pull this off.

In my context, by far the primary need is a "hot spare." I am glad to hear that that is coming down the pipeline. DRDB seems like a great candidate for creating "spares" (and it sounds like users already have been doing this…)

Not to be too simplistic - but if a method exists to replicate a virtualmin server in the context of a “hot spare” it seems the logical first step in solidifying not only fail over but load balancing as well. That’s why conceptually I personally tend to group these together.

From a usage perspective - if database driven text content with lean/normal images are what you are serving, then load balancing is overkill except in the case of sites getting millions of hits (ala Yahoo and Google). In our case we are serving large (200+GB) .flv files, as well as many audio files from other hosted servers (10-20MB mp3 each). Having load-balanced servers handling requests for files of this size will improve performance - media rich environments (more and more common these days) will benefit from this.

If the virtualmin team is going to tackle the “hot spare” feature (again - that is awesome); what is the next logical step to achieving a load-balanced environment for strictly port 80 requests? What time I can spare (read: sleep I can go without) I’d like to research it and contribute. Maybe a itemized list of the issues in the strict sense port 80 web requests? If I am reading this correctly, its the database?

Sincerely,
-Jeremy

The hotspare option would in fact be good enough for me - just a shame its 2-3 months out in the future cause that means i gotta work something out myself in the mean time.

But my entire setup is consisting of pairs of servers - for every server i setup i setup a twin server but id really like to be able to have a hotspare solution for the twin server so it is automatically duplicated in certain time intervals and takes over in case it needs to.

So thinking into the hotspare solution it would be nice to be able to setup "pairs" or "twins" in virtalmin så you could have cluster control of the hotspares too.

I’ve been working on the “hot spare” concept.

Here is a general concept:
http://books.google.com/books?id=wNzltxkWiGAC&pg=PA75&lpg=PA75&dq="balance.front"+"rsync"&source=web&ots=p_MfXqyxU9&sig=tYG9B1qo9MfjMy0JB788KCvfEz4&hl=en&sa=X&oi=book_result&resnum=2&ct=result#PPA76,M1

…from the “Linux Server Hacks” book. Basically a bash script that uses an rsync configuration file to copy everything from on server (or servers) to the other. I’ve modified the files to work >somewhat< in virtualmin.

The current problem I have is that the userIDs of serverA are different than the userIDs of serverB so files are copied over and the users do not exist correctly on serverB. It’s kind-of funky.

Right now I running a backup creation on serverA and then recreating it on serverB making sure that users and GUID’s are created with the originals. Then the rsync from serverA->serverB I think will be working correctly…

After I’ve perfected the process, I’ll post a few things if there are interested parties. In the meantime - it’s the 2-3 month mark since the initial posting of the “hot-swap” feature request - so my labor may be in vain. I would not mind frankly…
Cheers!
-merlynx

>>>Bump<<<
Can we get a status update on the “hot swap” feature?
It would be awesome for some of the community members who have configured failover and load balancing to build a document that got officially ratified in some way…we could refer to it here and let me tell you, I’d be willing to test implement it in a “heartbeat” as I’ve been directly banging my blockhead reading documents about this…

Hey there!

I’d be interested in a hot spare environment too because customers keep asking me about this feature.

DRBD is very nice, but you have to establish this kind of solution manually because there are no modules in webmin/virtual for active/passive "clustering" so far.

Any news on that?