rendundancy, clustering and failovers

merlynx · January 19, 2009, 2:48pm

I’ve not had the chance to spend the time working on this solution, I was told months ago that the development team was putting this feature on their radar to work on…that was nearly a year ago. There are tons of feature requests and bugs to iron out in a such a complex tool like virtualmin…I don’t know if there is any deadline on such a feature.

I’ve experimented with drbd - this solution takes a lot of foresight IMHO but is an excellent one according to those implementing it.

I’ve gotten some great responses from this thread:

http://www.virtualmin.com/forums/virtualmin/clustering-virtualmin.html

Including a sample/example implementation of fail-over that is live and working. It takes some significant time and tweaking to make it work (at least, for me), and I’ve not had the time to really implement the feasible solution mentioned above - I believe either the DRBD solution or the shared directory solution(s) would work for anyone - IMHO they are simply not easy to setup for the average linux admin. Clearly, there is more than one way to skin this cat, but the solutions are currently outside of the context of virtualmin and are dependent on your own methods and administrative ingenuity.

Good luck!

Joe · January 19, 2009, 5:00pm

There are a few things sort of simmering on this capability. VM2 is going public pretty soon, which gives an over-arching "one server manages many" capability–which is kind of necessary for marshaling resources for failover and other sorts of high availability.

Eric has been itching to work on documentation and processes for backup servers and such, and when the new website launches (meaning he’s finished with a bunch of other docs I’ve got him working on this month), I’ll set him loose on that, and he and I will work together on coming up with requirements for the major dev work that would need to go into it on Jamie’s side. The process of documenting what goes into it will make it more apparent what kinds of things Virtualmin and/or VM2 and/or Webmin can do to make those tasks simpler, more automatic, and more fool-proof.

As I’ve mentioned in several threads on the topic, the really hard work will never be within Virtualmin’s purview. If you want your applications to scale, then your applications have to be designed to scale…and Virtualmin can’t automatically make them scale.

The things we can do, however, include things like MySQL replication, shared data via ZFS or GFS, IP takeover, and possibly load balancing via mod_proxy_balancer–all of which are challenging in their own right, and I know a lot of folks who’ve had a hard time with them. Those are all big projects, however, and they kind of all have to be designed together, or they won’t fit together very well at the end of it all. (This is also a problem. If the infrastructure we design doesn’t fit exactly with the way people want to do things, they won’t use it. We always design a lot of flexibility into our products, but this is an area where the full stack has to be pretty precisely designed.)

Also, I think I need to mention why this has remained a backburner project (beyond being really complicated to implement): You guys are way in the minority among our customers. There are maybe a dozen folks, out of a couple thousand, who have expressed an interest in clustering and failover and such…and none of our large hosting provider customers and potential customers have even mentioned it. You guys, and me and Eric and Jamie think this stuff is cool because it’s really interesting and fun to play with, and we talk about it quite a bit…but, the market isn’t demanding it. I’m afraid we may have to make this a proprietary option that costs extra in order to make it feasible to build and maintain it–would having to spend some extra money make this feature less appealing? And, how much extra money would make it less appealing? Right now, it’s looking like at bare minimum, it’s going to be a Virtualmin+VM2-only feature, so you’ll be buying at least one license of each (Virtualmin can be used on a hot spare at no extra cost–though if you begin doing load balancing, we’ll probably want you to buy another Virtualmin license). VM2 in this particular configuration will probably only be $198 (or free if you have more than five Virtualmin licenses). If an extra plugin, costing maybe another $98 or $198 or even more, were needed, would this be cost-prohibitive? (So, now we’re talking about $296 or $396 or more extra on top of your Virtualmin licenses, in order to perform high availability and load balancing.)

I don’t know if the math will work out, even at the higher price I mentioned…but since we do know that it’s a big project, requiring involvement of a lot of people and a lot of software (including a lot of packages not provided by a stock CentOS install), and the userbase is far more limited than for a lot of other capabilities, I do know we’re going to have to figure out how to make it pay for itself. For some reason it never really occurred to me that we could just say, “Hey guys, if you want it, we’ll build it, but we’re going to want you to pay for it.” We’ve always just thrown all new features into Virtualmin, for free, and assumed that the increased sales would make it worthwhile…but the more I do the math on this one, the more I realize this model is not going to work for clustering and high availability.

beat · January 29, 2009, 8:06pm

alessice wrote:

Yes, with DRBD and Heartbeat you can set-up an HA solutions for Virtualmin, including also MySQL, Mail etc ...

And works well!

Yes that works very well, although requiring quite some sysadmin and webmaster skillset, a good test-bed to try pulling all cables out, one at a time preferably and then a very good datacenter which is capable of providing the required hardware for that. And lots of time to design, implement and test. But at the end of the trip, it works well.

I don’t think that it’s really Virtualmin Pro’s task to manage such a setup, as it’s highly hardware-configuration dependant, and actually when you replicate/failover redundantly, you are failing-over virtualmin Pro with the server anyway.

But, when you start this kind of setups, you start to balance loads, and quickly have many Virtualmin Pro instances (and licenses as well, which is really cool for the great Virtualmin team) to manage.

Now that clustering of servers is solved, our main hurdle is to manage and keep up to sync all the configs of webmins and virtualminPros around. We got a few only at this time, but if things go well, there might be …more, …hopefully.

Before going into the clustering management, I believe that an architectural change is required in Virtualmin (not necessarily webmin in a first step, although that would make sense as well): Here is an idea that might work:

Having one (redundant/replicated through clustering) instance of VirtualMin Client, where all the web-frontend happens.
Having one instance of VirtualMin Server per host
"clustering" the VirtualMin Client with each of the VirtualMin Server that it manages

A little bit like your would do with X11 and NFS to mention a few.

The single VirtualMin Client is the interface for admins and all customers of all servers, and keeps a single coherant database of sites. It also remotely administers the sites.

That would be making migrating of a single site from one server to another seamless, at least in the User interface, and balancing loads between servers in the cluster depending on each site load easier, would well integrate with cloud computing offerings, and simplify administration of systems hugely.

Also a tool to quickly see which site suddenly eats up many CPU or MySql resources in main control panel would be really nice. That would help load balancing as well…

Any hope that VM2 would solve that ?

I don’t want to comment on the prices you mentioned at this stage, as I’m not sure to understand what the final bill would be and what’s required.

beat · January 29, 2009, 8:09pm

Can’t edit my previous message as i hit another bug of fireboard.

Was just to say that my quote was wrong as well, as I wanted initially to quote Joe on VM2 and clustering

Anyway, my reply holds

merlynx · January 29, 2009, 8:36pm

I would be very satisfied with a hot-swap solution.

A replication process that targets another virtualmin system and literally copies everything from server A to server B. Something you can set up as a cron job and configure in the cluster interface or something like that. The secondary server, in my case, does not even need to be "on-line" it is just a system ready to take the place of the primary when - not if - it goes down.

Including a load-balancing solution in this setup seems logical, but with the interworkings of it all, is clearly not a simple matter.

I would like to try VM2 in this context, but in my testing, our older hardware is not really ready (IMHO) to handle a virtualized or para-virtualized (Xen) solution. It’s just not that powerful and most of it is around 5 years old. I don’t know, I’ve not looked into VM2 in detail, so perhaps I am wrong here.

I find it interesting that with an enterprise product, more people are not requesting this kind of functionality. I suppose, the systems administrators that have this on their minds are the same ones who have the skill set to implement such a solution without a tool like virtualmin making it simpler for them.

Thanks for your thoughts.

beat · January 29, 2009, 8:42pm

The client-server architecture could allow not only for easy migration of sites, but also for cloning between hosts, and also for replication/backup/archival purpooses, and in that case be very efficient also on old hardware, using rync or rsync-diff for backup+archival.

Just some thoughts how this client-server feature could be appealing for more customers, any of those having more than 1 server to manage with Virtualmin Pro

hizar · June 15, 2009, 3:58am

Hi all… I managed to get this working quite nicely and its been stable for a couple of weeks now and the failover works well so far…

HowTo on my blog… http://safestream.net/blog/?p=1

Hope this is of use…

Eric · June 15, 2009, 4:04am

Wow, that’s a pretty thorough writeup.

Glad to hear you got it working, and thanks for sharing your work!
-Eric

hizar · June 19, 2009, 11:52pm

sorry, blog moved… here is the new link for the howto…

http://safestream.net/cms/

sageadmin · August 26, 2009, 1:59am

Hello Hizar and to others as well,
I have started with a high-available and load balanced setup for virtualmin. I am looking out for some guides/experience. So if you have any URL or links, it will help me most.

Hizar, I tried to reach your blog, but am unable to view it at all. Can you please let me know how I can have a look at your howto.

Thanks in advance

with regards

xps · August 26, 2009, 12:32pm

doesnt work for me - is the site available still?

jessec · September 10, 2009, 3:58pm

Hi,

Would it be to crazy just to do a find and replace of the ip address.

so first a rsync and then.

for originalip in file replace with this hotspareip.

Or are there many other things to configure.

To me it looks like the only difference with the original server is the ip address.

Eric · September 10, 2009, 10:39pm

Well, it comes down to the specifics of how you intend on doing all that – but something along those lines could certainly work, sure. It’s certainly worth perusing – just make sure to test it before having to rely on it

-Eric

jessec · September 14, 2009, 8:19am

Just dropped the idea I could get a master-master setup so i’m now focusing on a master-slave setup.

This ‘kfsmd’ http://www.linux.com/archive/feature/124903 could be the trigger for the rsync script.

Maybe there is a way for the user to have a master-master setup by a config file in their home dir.
To tell kfsmd what to monitor on the slave and a daemon on the master to process that.

For now I only quickly found a server specific ip in the apache config, but if someone has a list of all
the files containing ip’s or could potentially contain an ip that would be nice. If i’m missing something please let me know.

jessec · September 14, 2009, 8:25am

The main focus of this solution would be a very cheap and simple way to get an acceptable redundancy.
Again the main focus is to reduce cost and to use the cheap vps servers that are available.

So 100% reliable is not the aim but to get something acceptable for a good price.

iambacon · September 18, 2009, 6:08am

I recently started playing around with linodes. It’s been a lot of fun. I even bought a Cloudmin license to see how far it would go. I’d love to “run a cloud” with cloudmin but amazon seems kinda pricey compared to $200 for a quad core dedicated server. Linodes are great but webmin seems a better way to cluster them than cloudmin.

Just my $.02

jessec · September 20, 2009, 2:33pm

Hi,

Still after a easy solution.

I found http://code.google.com/p/lsyncd/
and

a python script for ip’s.

import os,sys

def getFiles(dir):
foundFiles = []

if dir[-1:] == “/”:
dir = dir[0:-1]

for x in os.listdir(dir):
if os.path.isdir(dir + “/” + x):
foundFiles.extend(getFiles(dir + “/” + x))
else:
# You can replace/comment out this if you want to, it’ll only work
# with “html” files if you don’t.

  #if x[-4:] == "html":
  #  foundFiles.append(dir + "/" + x)
  #print x 
  if os.path.isfile(dir + "/" + x):
    foundFiles.append(dir + "/" + x)

return foundFiles

def dryrun(file):
infile = open(file, “r”)
text = infile.read()
if text.find(sys.argv[2]) != -1:
#infile.close()
#infile = open(file, “w”)
text = text.replace(sys.argv[2], sys.argv[3])
print "Replaced strings in " + file
#infile.write(text)
infile.close()

def fixFile(file):
“”"
I would use “r+” here, but it didn’t work right, it was acting
to append, not to write over. So since I was far too lazy to
do it right the script ends up with this hackish open/close
cycle that is probably really inefficient. Oh well.
“”"
infile = open(file, “r”)
text = infile.read()
if text.find(sys.argv[2]) != -1:
infile.close()
infile = open(file, “w”)
text = text.replace(sys.argv[2], sys.argv[3])
print "Replaced strings in " + file
infile.write(text)
infile.close()

This program doesn’t have a concept of input checking, you better know how

to use this, don’t look at me!

if len(sys.argv) != 4:
print “Usage: %s <starting_string> <replacing_string>” % sys.argv[0]
sys.exit(1)

files = getFiles(sys.argv[1])

for file in files:
dryrun(file)
#fixFile(file)

jessec · September 20, 2009, 2:35pm

If anybody has any experience with extreemfs or glusterfs I would be interested in that.

regards,

system · October 13, 2009, 10:15am

There are excellent tips, thanks for the inspiration!

hizar · October 16, 2009, 9:12am

Sorry been away for awhile… here is the link:

http://data-server.org/blog

oh, I solved the sync problem with system files using csync2, also provided by the developers of DRBD…