Results 1 to 2 of 2
Enjoy an ad free experience by logging in. Not a member yet? Register.
- Join Date
- Aug 2010
[SOLVED] RAID10 trouble - disk failure plus non-fresh disks
I have experienced a failure in one disk in my 4 disk software RAID10
setup, but a straightforward rebuild is thwarted by the fact that two
of the other disks are considered non-fresh and hence get kicked out
of the array. This of course prevents the array from starting up.
Here's more detail about my setup:
I have 4 2TB SATA disks in RAID10.
Device Boot Start End Blocks Id System
/dev/sda1 1 243201 1953512001 fd Linux raid autodetect
/dev/sdc1 1 243201 1953512001 fd Linux raid autodetect
/dev/sdd1 1 243201 1953512001 fd Linux raid autodetect
/dev/sde1 1 243201 1953512001 fd Linux raid autodetect
I have another disk (/dev/sdb) that hosts the operating system, so I can boot up and work on this machine directly.
$ sudo mdadm --detail /dev/md0
Version : 00.90
Creation Time : Thu Nov 5 16:44:06 2009
Raid Level : raid10
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Raid Devices : 4
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Nov 16 10:24:11 2010
State : active, degraded, Not Started
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Layout : near=2, far=1
Chunk Size : 64K
UUID : e0c049f7:658d2514:6fcc1897:bafeb8ef (local to host xxx.xxx.xxx)
Events : 0.8646
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 0 0 1 removed
2 0 0 2 removed
3 0 0 3 removed
At first I thought that this meant that I had experienced simultaneous
disk failures in *three* of my 4 disks, but I've now run 'smartctl -t
short' on all four disks, and it "Completed without error" for 3 out
of the 4 disks. Only /dev/sdd1 gave a "read failure", which gives me
hope that perhaps it's only that one disk that is borked, and I can
still rebuild my RAID array using a replacement disk.
My problem is that in order to add the replacement disk and rebuild
the array, I first need the array to come up in degraded, but active
mode. In /var/log/syslog I see 'md: kicking non-fresh sdc1 from
array!' and the same for sde1. I've read in other threads that people
typically just re-add the non-fresh disks, but I have two concerns:
a) I worry that by re-adding, say, /dev/sdc (i.e. one of the non-fresh
disks) data on this disk that is necessary to rebuild /dev/sdd (the
failed disk) will be overwritten. Is this a possibility?
b) I don't see how data on /dev/sda (the only remaining active) disk
will be sufficient to re-add /dev/sdc and /dev/sde. What I want to
do is to add the /dev/sdc and /dev/sde back into the array
*without* re-building it. It seems that --re-add (as opposed to
--add) is supposed to that, but I'm unsure about that.
I'm proceeding very cautiously here, as I still harbor hopes of
recovering all the data, and I don't want to mess something up now. In
fact, at the moment I'm creating an exact clone of /dev/sde (using dd
-- it's taking forever, 17 hours and only ~3/4 done...), so I can
experiment with the --re-add on this disk and still have an intact
copy of it, in case something goes wrong.
Any advice greatly appreciated!
- Join Date
- Aug 2010
Crickets... Maybe I posted to the wrong forum?
For the record, I was able to fix my RAID. Here's what I did. After the backup cloning of /dev/sde finished (took close to 24 hours), I did
$ sudo mdadm /dev/md0 --re-add /dev/sde
$ sudo mdadm /dev/md0 --run
and my RAID was back in business (in a degraded state of course). I then rebooted and the second non-fresh disk (/dev/sdc) was automatically added and started rebuilding. It's a bit odd that it needed to be rebuilt, since there was nothing wrong with that disk at all. Maybe I should have just re-added it together with /dev/sde. That would probably have saved the rebuilding.
I've now added the replacement disk (/dev/sdd) and it has successfully been rebuilt too. My RAID array is back in full working order. Yay!