Results 1 to 7 of 7
I have an ssh session open to a remote debian squeeze server that has 3 drives configured as 2 separate software raids. I believe it is very possible that 2 ...
- 01-17-2012 #1Just Joined!
- Join Date
- Jan 2012
- Posts
- 5
Raid 5 State : clean, FAILED
I have an ssh session open to a remote debian squeeze server that has 3 drives configured as 2 separate software raids. I believe it is very possible that 2 of the drives have failed.
andCode:# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Feb 21 17:01:43 2011 Raid Level : raid1 Array Size : 975860 (953.15 MiB 999.28 MB) Used Dev Size : 975860 (953.15 MiB 999.28 MB) Raid Devices : 2 Total Devices : 3 Persistence : Superblock is persistent Update Time : Tue Jan 17 12:33:52 2012 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 1 Spare Devices : 1 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 0 8 1 - faulty spare /dev/sda1 2 8 33 - spare /dev/sdc1
LVM is configured on top of the raid setups. The server is running, but crippled. I'm pretty sure that the root filesystem is mounted on a logical volume on /dev/md0 but lvdisplay gives me "Input/output error" when trying to verify. Since /dev/md0 is degraded it makes sense that the system is still able to run, but that perhaps it has switched to read-only.Code:# mdadm --detail /dev/md1 /dev/md1: Version : 1.2 Creation Time : Mon Feb 21 17:01:49 2011 Raid Level : raid5 Array Size : 1951564800 (1861.16 GiB 1998.40 GB) Used Dev Size : 975782400 (930.58 GiB 999.20 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Tue Jan 17 13:32:40 2012 State : clean, FAILED Active Devices : 1 Working Devices : 1 Failed Devices : 2 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : westlund2:1 (local to host westlund2) UUID : 705a71ba:dc848b5b:f64ee011:287606b1 Events : 342 Number Major Minor RaidDevice State 0 0 0 0 removed 1 0 0 1 removed 2 8 34 2 active sync /dev/sdc2 0 8 2 - faulty spare /dev/sda2 1 8 18 - faulty spare /dev/sdb2
I do not have smartmontools installed to test the drives and cannot install them. I believe if I can get /dev/md0 out of degraded mode I could remount it read/write and install them?Code:# mount /dev/mapper/raid-root on / type ext4 (rw,errors=remount-ro) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) /dev/md0 on /boot type ext3 (rw) xenfs on /proc/xen type xenfs (rw) /dev/mapper/raid-mail--disk on /mnt/mail type ext3 (rw) mount: warning: /etc/mtab is not writable (e.g. read-only filesystem). It's possible that information reported by mount(8) is not up to date. For actual information about system mount points check the /proc/mounts file.
Looking for advise on how to carefully proceed. This is getting a bit more low level than I am used to working.
Thanks,
Steve
- 01-17-2012 #2Just Joined!
- Join Date
- Jan 2012
- Posts
- 5
It seems that superblocks are missing for some reason

When I try to add /dev/sdc1 back to /dev/md0 i get an error.Code:# mdadm -E /dev/sda1 mdadm: No md superblock detected on /dev/sda1. # mdadm -E /dev/sdb1 mdadm: No md superblock detected on /dev/sdb1. # mdadm -E /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 049d8c72:e669e0b3:4e427ff5:9c45fa70 Name : server1:0 (local to host server1) Creation Time : Mon Feb 21 17:01:43 2011 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 1951720 (953.15 MiB 999.28 MB) Array Size : 1951720 (953.15 MiB 999.28 MB) Data Offset : 24 sectors Super Offset : 8 sectors State : clean Device UUID : cf258f35:b6c06ebb:375a74fe:b02c2983 Update Time : Tue Jan 17 14:08:11 2012 Checksum : 1e75b738 - correct Events : 362 Device Role : spare Array State : .A ('A' == active, '.' == missing) root@westlund2:~#
I believe this is because the only active device in /dev/md0 is /dev/sdb1 and it is missing its superblockCode:# mdadm --add /dev/md0 /dev/sdc1 mdadm: cannot find valid superblock in this array - HELP
Is there a way to re-initialize the superblock for /dev/sdb1 based on the superblock from /dev/sdc1?Code:# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Feb 21 17:01:43 2011 Raid Level : raid1 Array Size : 975860 (953.15 MiB 999.28 MB) Used Dev Size : 975860 (953.15 MiB 999.28 MB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Jan 17 15:05:24 2012 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1
TIA
- 01-18-2012 #3Just Joined!
- Join Date
- Jan 2012
- Posts
- 5
Could still use some advise on this. Am I not posting the information needed? Right now it seems to me I need to try to re-initialize the superblock for /dev/sdb1. Is there a way I could manually configure the raid that would eliminate the need to have a valid superblock? I just would like to try to save some data from the raid 5 (may be impossible at this point anyway?) before tearing this server down.
TIA
- 01-20-2012 #4Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
I have been watching this thread, but I have no useful advice for your recovery issue. Here are some random thoughts.
I usually use only RAID1.
I do see a reference:
Presumably the RAID5 is where your data resides, and the RAID1 is for the basic system. The excerpt above looks discouraging.With larger drive capacities the odds of a drive failure during rebuild are not negligible. In that event, the difficulty of extracting data from a failed array must be considered. Only a RAID 1 (mirror) stores all data on each drive in the array. Although it may depend on the controller, some individual drives in a RAID 1 can be read as a single conventional drive; this means a damaged RAID 1 can often be easily recovered if at least one component drive is in working condition. If the damage is more severe, some or all data can often be recovered by professional data recovery specialists. However, other RAID levels (like RAID level 5) present much more formidable obstacles to data recovery.
...
Given a RAID with only one drive of redundancy (RAIDs 3, 4, and 5), a second failure would cause complete failure of the array. Even though individual drives' mean time between failure (MTBF) have increased over time, this increase has not kept pace with the increased storage capacity of the drives. The time to rebuild the array after a single drive failure, as well as the chance of a second failure during a rebuild, have increased over time.[38]
-- excerpt from RAID - Wikipedia, the free encyclopedia
If there are 2 failed disks, I don't see how recovery of the data array could happen -- because 1/2 of the the interleaved blocks of data would inaccessible. I suppose that recovery would depend on how bad the failure really is -- the data is not erased, just not easily accessible. If the blocks can still be read, an immediate copy to a good device would allow experimentation with tools, or, in the worst case, provide the disk to a recovery service, expensive, but often necessary if there is no backup extant, and no other way to reproduce the data.
The addition of LVM on top of the RAID is another level of complication. I have one machine like that, and I am trying that as an experiment for live backup -- the backup tool [ rsnapshot ] sees that it is backing up an LVM partition, so it creates a shadow copy-on-write partition while doing the backup. So far it has seemed to be working well.
Regarding the mount-as-read-only aspect, I have one such RAID1 partition:
However, it has not been remounted read-only:Code:Personalities : [raid1] md4 : active raid1 sda8[0] 55649024 blocks [2/1] [U_]
I am waiting to replace this disk with a new disk that I already have on hand, but I want to upgrade the level of the OS, so I am testing on a separate machine as a virtual machine.Code:% ls -ld / drwxr-xr-x 25 root root 4096 Nov 10 17:08 //
Using software RAID is more flexible than hardware RAID, and if you machine is fast enough for the calculations, software RAID 6 seems like a better solution for the future.
Good luck ... cheers, drlWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 01-20-2012 #5Just Joined!
- Join Date
- Jan 2012
- Posts
- 5
I was actually able to get data off the raid 5 yesterday. I was hoping that would be the case when I saw the Raid 1 boot partition was running on one drive and the raid 5 showed that it only had one good disk left but it was a different disk.
I couldn't find any way to remount the root filesystem read write so eventually I gave up and rebooted. Ended up in busybox and was able to recreate the raid 1 using
after finishing the boot and installing smartmontools I was able to diagnose that sda was failing but sdb and sdc checked out OK. I ran the same mdadm create but for the raid 5 and it seemed to work. After recreating /dev/md1 I was able to lvchange -a -y to activate the logical volumes, mount them, and grab the data off.Code:mdadm --create --assume-clean --level=1 --raid-devices=2 /dev/md0 /dev/sdb1 /dev/sdb2
I'm not sure how the superblocks were destroyed on 4 of the 6 partitions. I can't believe --assume-clean worked on both raid configurations. The only thing I can think of is that having it configured to mark the filesystem read-only on failure allowed both raid volumes to stay clean till I found them over a week later.
Note to self, when I set this server back up, monitor it with something like zabbix!
- 01-20-2012 #6Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
Thanks for having posted your action on this problem. It may help someone else in the future with similar troubles.
Are you also planning a backup as well as monitoring? ... cheers, drl
( edit 1: grammar )Last edited by drl; 01-20-2012 at 03:36 PM.
Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 01-20-2012 #7Just Joined!
- Join Date
- Jan 2012
- Posts
- 5
no, a backup would just make entirely too much sense

ok, maybe a backup


Reply With Quote