How to restore software RAID 1 via webmin

fakemoth · November 16, 2009, 10:40am

Hello, not a very Webmin issue, but I rather ask it here. What is your recommended course of action for restoring a RAID 1 software array? My problem is that one of my RAID 1 Velociraptors HLFS (the second, new one, if you believe it!) has SMART (also the problems are visible in /var/log/messages) errors:

Nov 16 11:12:22 ns1 smartd[3247]: Device: /dev/sda, 5 Currently unreadable (pending) sectors

BTW the array is still healthy&working (mdadm is not marking it as failing and no reconstructions are being made), and the drive is still in working condition, but i’m being cautios. It’s /dev/sda. So I will get a another one, for replacement…

I have the following partitioning scheme:

2 x WD Velociraptor in RAID 1 for /boot , / , /var ,swap
2 x ST 1TB in RAID 1 for /home
Centos x86_64 5.4 with Virtualmin PRO and your recommended setup via script - working fine

Device name Active? RAID level Usable size Member disk devices

/dev/md0 Yes Mirrored (RAID1) 297.94 MB /dev/sda1 | /dev/sdb1

/dev/md1 Yes Mirrored (RAID1) 136.71 GB /dev/sda2 | /dev/sdb2

/dev/md2 Yes Mirrored (RAID1) 136.71 GB /dev/sda3 | /dev/sdb3

/dev/md3 Yes Mirrored (RAID1) 5.37 GB /dev/sda5 | /dev/sdb5

/dev/md4 Yes Mirrored (RAID1) 931.51 GB /dev/sdc1 | /dev/sdd1

Disk name Total size Make and model Partitions Actions

SCSI device A 279.46 GB ATA WDC WD3000HLFS-0 5 IDE parameters | SMART status

SCSI device B 279.46 GB ATA WDC WD3000HLFS-0 5 IDE parameters | SMART status

SCSI device C 931.51 GB ATA ST31000528AS 1 IDE parameters | SMART status

SCSI device D 931.51 GB ATA ST31000528AS 1 IDE parameters | SMART status

Can i do it something like:

format the new HDD exactly the same, software RAID 1 partitions, or clone the failing one (cloning is the best thing I guess)
shutdown the machine and replace the defective one;
power on, go to Webmin and add the new partitions to the RAID 1 array from the new drive (will they show, or it will begin automatically the background reconstruction, what’s going to happen)?

It can’t be that simple…

Pls post your experience with this kind of issues, wednesday it’ll be the day

PS: Those are the last crapy HLFS that I bought; never had a problem with GLFS series, or the Raptors! I’m begining to think the two Velociraptor series are not the same!

Jamie · November 17, 2009, 8:13pm

Yes, you can pretty much follow the steps above. As long as the new drive has the same partitions as the old one, when you put it in and boot the system Linux’s RAID drivers should detect that this is a replacement drive, and sync it up.

The only possible catch is if the failing drive also has some non-RAID partitions on it, such as whatever is used for /boot . If that is the case, it would need to be copied over as well. Also, if the drive is the primary boot disk, it might need to be marked as bootable, and GRUB installed…

BossHog · November 17, 2009, 8:50pm

Howdy,
just to reinforce Jamie’s post, GRUB install takes an extra step on a RAID array.
Here are my preferred links:
http://lists.us.dell.com/pipermail/linux-poweredge/2003-July/008898.html
and
http://aplawrence.com/Linux/rebuildraid.html
WARNING the links are not step for step instructions, but close.
They give the basics plus the “gotchas” to be on the alert for and a very clear explanation of installing GRUB on a recovered RAID.
Good luck
Joe

jonkristian · December 3, 2009, 1:27pm

This is something I’ve been thinking about aswell, as I’m kinda new to this scheme. So for future reference, if a drive under raid-1 fails and i replace it, i will still have to set up the same partition structure and re-install grub? I thought at least, if a drive has the same partition structure, everything will be synced automatically., am I wrong here?

fakemoth · December 3, 2009, 2:34pm

Quite wrong I hate to say it Never had a problem on the RAID array in Linux, till this one so it worked by itself mostly, but I done/saw a lot of rebuilding on Windows and Solaris and it’s automatic. Unfortunately there is no automatic partitioning and no syncing with mdadm - in my case at least. And the partitions must be exactly the same.

I tried also partitioning with a LiveCD, booted the system with the working drive and hot-connecting the new drive > no automation, no luck.

That part I understand - what I don’t is how it’s not possible in almost 2010 not to have your boot loader rebuilt in RAID 1 if /boot (where GRUB is in my case) it’s part of a RAID array… Separately, sure, but in a RAID array?

This is really stupid - I mean, i know it’s possible to autodetect a new unpartitioned drive, and, popup a question at least: what’s this, how do you want to use it, is it a spare drive for my destroyed array? and stuff… That would be so cool. Not all Microsoft ideeas are stupid; but it’s true those are just a few Except this one Linux beats the crap out of Windows when it comes to software RAID.

Anyone knows some nice, preferably gui, open source Linux software for RAID to be capable of something like this?

fakemoth · December 3, 2009, 2:39pm

Solved weeks ago - it is necessary to reinstall GRUB after such an operation. In my case on the /boot partiton > md0.

fakemoth · December 3, 2009, 2:40pm

Hello - job done!

Here it is:

-stopped the machine and removed the bad drive;

-grub is complaining and not starting so > root (hd0,0) and setup (hd0);

-rebooted, with only Velociraptor drive (md0, md1, md2, md3 - sdb, became sda) and the other two Seagates (md4 - sdc & sdd, became sdb & sdc);

-hot plugged the new partitioned Velociraptor drive (regular ext3 partitions, used for previous testing purposes) - sdd;

-formated the ext3 partitions via Webmin as Linux RAID;

-added the partitions to the array via Webmin, background reconstruction began;

-still hating the console and the black, empty screens

Now I don’t know much about grub; did the boot loader remain on the /boot partition - RAID 1? Webmin says it is but here is my grub.conf - I guess it isn’t… Seems like it’s on one of the Seagates hd3,0 ??? … Those are holding the /home partition.

grub.conf generated by anaconda

Note that you do not have to rerun grub after making changes to this file

NOTICE: You have a /boot partition. This means that

all kernel and initrd paths are relative to /boot/, eg.

root (hd3,0)

kernel /vmlinuz-version ro root=/dev/md1

initrd /initrd-version.img

#boot=/dev/md0
default=0
timeout=5
splashimage=(hd3,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-164.6.1.el5)
root (hd3,0)
kernel /vmlinuz-2.6.18-164.6.1.el5 ro root=/dev/md1 rhgb quiet crashkernel=128M@16M
initrd /initrd-2.6.18-164.6.1.el5.img
title CentOS (2.6.18-164.el5)
root (hd3,0)
kernel /vmlinuz-2.6.18-164.el5 ro root=/dev/md1 rhgb quiet crashkernel=128M@16M
initrd /initrd-2.6.18-164.el5.img