We had an outage of software.virtualmin.com software repos a couple of nights ago in the middle of the night (around 1:30AM central time). It didn’t last long (I was alerted when it failed and it was back up in about an hour), but, while sorting out what happened I found that one of the disks in the array where the VM runs is failing. Seemingly slowly, but obviously, we don’t want to rely on a disk that could randomly fail at any time, and might cause future outages.
So, I’ve been working on migrating the repos to a new server, and I’m also taking the opportunity to overhaul how we do things; I’ve added some scripts to purge old packages more frequently (this was done periodically by hand in the past). I will also be implementing some redundancy capabilities (two servers in two data centers, load-balanced DNS with failure detection). And, turning on DRPM (diff RPM) for the yum repositories, so updates will be smaller. The benefit to purging old packages more aggressively is that spinning up new repo server clones will be much faster in the future, making it easier to recover from a failure, even if it’s a total system failure.
This is a large number of changes to push out at once, but now is a good time to do it, since Virtualmin 6 is coming soon, and we’re beginning to automate a bunch of our other release infrastructure. But, because it’s a large number of changes, I bet I’ve broken something. Maybe a lot of somethings. I’ll be testing, but I also welcome bug reports, if you find you’re getting errors from our repos (things hosted on software.virtualmin.com).
We’ve never really had problems with performance of our repos (our colo provides very fast pipes), so I don’t expect this will do much to speed up installs or anything fun like that, but redundancy is nice from a reliability perspective, and it means I will never have to be awake until 4AM poking at the server because it crashed. It can wait until morning, because it won’t interrupt service!
I expect to wrap up my changes by later tonight, but bugs might shake out over a few days.