Multiple RAID 10 + LVM Failure Risk
I have been searching and trying to piece together a good answer to this problem for a little while now but have not been able find a satisfactory answer.
Summary:
What happens when an LVM that spans multiple RAID array has one (or more) or the RAID arrays fail (note that would be same if you were using single drives in the LVM too)? Do you lose everything or can you recover?
Details:
I am upgrading my existing storage solution soon from a single RAID 5 array (4x1TB) with a single ext3 partition to a RAID 10 array (4x2TB). I then plan on to move all the data from the old RAID 5 array to the new RAID 10 array, break the RAID 5 array and rebuild it as a second RAID 10 array. I was initially thinking of using LVM on top of the new RAID 10 array and then expanding the volume group with the second RAID 10 array, giving me a single 6TB logical "drive" on which I would place a single ext4 file system.
The problem is I got thinking about what happens when a RAID array fails (note the whole array, not just a drive). I know I would lose all the data on the failed array (that is a given), but what happens to the remaining data? So suppose I build this as mentioned, the initial VG is the full 4TB on the new array, I then expand the VG by adding in the old 2TB array and then expand the file system, resulting in the single 6TB file system. This is in effect RAID Linear over the 2 RAID arrays. Suppose the first array fails (the 4TB one), all data on it would be lost, but would I be able to recover any data on the 2TB array?
I don't fully know how the ext file systems are implemented, but if its like a FAT system (I say this very loosely) if the first sector of the file system goes down, how does the second array even know it has a file system on it? Basically if I loose the data that tells the system there is a file system on this device, will I loose everything on that file system? or is ext4 smart enough to know if the first part of the file system gets corrupted that the later part can still be recovered?
There seems to be mixed opinions on this. Some say that LVM (even in spanning mode) introduces a RAID 0 like failure mode, while others claim that you can still recover data off the remaining devices (although it will take work).
I have no immediate need to migrate, and I have extra hardware laying around so I could set up a test machine and simulate a failure, but I don't know how I would do this (I don't think just unplugging the drive would work, would it?).
Finally is this even something to be worried about? If I were using single disks as the base hardware then this would certainly be something to consider, but I intend to use RAID 10 underneath the LVM. I also have a policy of replacing drives at 6 month intervals until all the drives in the array are offset by 6 months (since all the drives see the same wear level this should minimize seeing multiple failures during a rebuild since none of the drives should be of same assembly line and wear level). I also do not intend to leave the array running in the event of a failure, I will shut the server down until I have the replacement hardware.
Other Details:
openSuse 11.3 x64
All RAID arrays are linux kernel RAID.
The OS/boot/home are on a separate RAID 1 array, the large file system is purely data storage.
I do have backups, but the best backup is the backup that is never used.
UPDATE:
I decided to try to simulate a failure by removing a drive, this is what happened
/dev/sdb => 4GB
/dev/sdc => 2GB
/dev/sdd => 1GB
1) /dev/vg0 => +4GB
2) /dev/vg0/vol0 => 4GB
3) mounted vol0 to /data and added to fstab (so it remounts on boot)
4) Filled vol0 with ~3.9GB of data
5) vg0 => +2GB + 1GB
6) vol0 => resize to max (vol0 is now 7GB)
7) Filled vol0 with ~6.9GB of data
8 ) Shutdown and unplugged sata cable for /dev/sdc (the 2GB partition in the "middle" of the volume group, it should contain part of the second data set and none of the first)
9) Boot system, OS fails to mount /dev/vg0/vol0 to /data
Error messages:
Activating LVM volume groups
Couldn't find device with uuid [ID]
Refusing activation of partial LV vol0, Use --partial to override
/dev/vg0/vol0: The superblock could not be read or does not describe a correct ext2 file system...
fsck failed for at least one file system (not /), repair and reboot
10) System is dropped to recovery console, asking for root password
11) activated vg0 using "vgchange -a y vg0 --partial". vg0 was activated in a read only mode
The problem now is that I cannot mount vol0 to /data, mount keeps demanding a file system type and when add "-t ext4" it claims wrong fs type.
So far it appears that LVM does indeed introduce RAID 0 like failure when spanned over multiple devices. I'll keep digging around and see if I can find a way to do a partial mount of the LVM, but it doesn't look good for the data.