Results 1 to 7 of 7
Hello,
I've been having trouble with a failed RAID for the past couple days and I have tried everything I can think of to fix the issue to no avail. ...
- 06-24-2011 #1Just Joined!
- Join Date
- Dec 2010
- Location
- Texas
- Posts
- 5
RAID 5 mdadm help.
Hello,
I've been having trouble with a failed RAID for the past couple days and I have tried everything I can think of to fix the issue to no avail. I was hoping to find someone who could help.
I have a Cent 5.4 box with a RAID 5 interface on /dev/md2 consisting of the following volumes.
I had an outage and unknowingly a bad UPS battery so the server's power was interrupted. When the device was rebooted, I was met with the following error:Code:hdd1[0] sdd1[4] sdc1[3] sdb1[2] sda1[1
This device is created when /dev/md2 is created by mdadm and mounted with my fstab. I turned my attention to /proc/mdstat and found the following:Code:fsck.ext3: No such file or directory while trying to open /dev/lvm/lvm-raid
I attempted to assemble the raid and found that only 4/5 of the drives could be contacted. I rebooted the device and looked in the bios to find that all 5 drives in this array were showing up. I could --force the array to assemble and it would use 4/5 drives, but of course my data was nowhere to be found. The directory this creates, /data, contains my virtual machines. I saw the directory and it's sub-directories, but there were no files.Code:Personalities : [raid1] [raid6] [raid5] [raid4] md2 : inactive hdd1[0] sdd1[4] sdc1[3] sdb1[2] sda1[1] 3662859520 blocks
I used mdadm --examine --scan and compared that information to my mdadm.conf and they were a match. The devices that mdadm.conf links to that array are in /dev. The system sees them. Anyway, I decided to examine the individual devices. I will provide you with the output of each device now.
Code:/dev/hdd1: Magic : a92b4efc Version : 0.90.00 UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574 Creation Time : Wed Jan 16 23:03:44 2008 Raid Level : raid5 Used Dev Size : 732571904 (698.64 GiB 750.15 GB) Array Size : 2930287616 (2794.54 GiB 3000.61 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 2 Update Time : Wed Jun 22 07:40:42 2011 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : 75bb48fb - correct Events : 15504911 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 0 22 65 0 active sync /dev/hdd1 0 0 22 65 0 active sync /dev/hdd1 1 1 0 0 1 faulty removed 2 2 8 1 2 active sync /dev/sda1 3 3 8 17 3 active sync /dev/sdb1 4 4 8 33 4 active sync /dev/sdc1Code:/dev/sda1: Magic : a92b4efc Version : 0.90.00 UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574 Creation Time : Wed Jan 16 23:03:44 2008 Raid Level : raid5 Used Dev Size : 732571904 (698.64 GiB 750.15 GB) Array Size : 2930287616 (2794.54 GiB 3000.61 GB) Raid Devices : 5 Total Devices : 5 Preferred Minor : 2 Update Time : Sun May 23 08:16:22 2010 State : clean Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Checksum : 72e0f6d6 - correct Events : 884248 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 1 8 1 1 active sync /dev/sda1 0 0 8 17 0 active sync /dev/sdb1 1 1 8 1 1 active sync /dev/sda1 2 2 8 65 2 active sync 3 3 8 49 3 active sync /dev/sdd1 4 4 8 33 4 active sync /dev/sdc1Code:/dev/sdb1: Magic : a92b4efc Version : 0.90.00 UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574 Creation Time : Wed Jan 16 23:03:44 2008 Raid Level : raid5 Used Dev Size : 732571904 (698.64 GiB 750.15 GB) Array Size : 2930287616 (2794.54 GiB 3000.61 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 2 Update Time : Wed Jun 22 07:40:42 2011 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : 75bb48b0 - correct Events : 15504911 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 2 8 1 2 active sync /dev/sda1 0 0 22 65 0 active sync /dev/hdd1 1 1 0 0 1 faulty removed 2 2 8 1 2 active sync /dev/sda1 3 3 8 17 3 active sync /dev/sdb1 4 4 8 33 4 active sync /dev/sdc1Code:/dev/sdc1: Magic : a92b4efc Version : 0.90.00 UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574 Creation Time : Wed Jan 16 23:03:44 2008 Raid Level : raid5 Used Dev Size : 732571904 (698.64 GiB 750.15 GB) Array Size : 2930287616 (2794.54 GiB 3000.61 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 2 Update Time : Wed Jun 22 07:40:42 2011 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : 75bb48c2 - correct Events : 15504911 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 3 8 17 3 active sync /dev/sdb1 0 0 22 65 0 active sync /dev/hdd1 1 1 0 0 1 faulty removed 2 2 8 1 2 active sync /dev/sda1 3 3 8 17 3 active sync /dev/sdb1 4 4 8 33 4 active sync /dev/sdc1Code:/dev/sdd1: Magic : a92b4efc Version : 0.90.00 UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574 Creation Time : Wed Jan 16 23:03:44 2008 Raid Level : raid5 Used Dev Size : 732571904 (698.64 GiB 750.15 GB) Array Size : 2930287616 (2794.54 GiB 3000.61 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 2 Update Time : Wed Jun 22 07:40:42 2011 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : 75bb48d4 - correct Events : 15504911 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 4 8 33 4 active sync /dev/sdc1 0 0 22 65 0 active sync /dev/hdd1 1 1 0 0 1 faulty removed 2 2 8 1 2 active sync /dev/sda1 3 3 8 17 3 active sync /dev/sdb1 4 4 8 33 4 active sync /dev/sdc1
As best I could discern, this was telling me that /dev/sdd1 was faulty. So I replaced it with a drive of the same size and was surprised to find that the system was telling me that now only 3/5 of the devices in the array could be contacted. I took this to indicate that /dev/sdd1 was not my problem. I started removing drives and trying to rebuild the array and each time I removed one, it would say it could contact 3/5 devices and the array could not be assembled. That is until I got down to /dev/sda1. When I removed it I was met with my original error: 4/5 devices in the array could not be contacted. This, contrary to what I had observed in mdadm --examine / mdadm --detail suggested that /dev/sda1 was the problem. I replaced that drive, removed /dev/sda1 from the array, re-added it and the array would still not build. I could cat /proc/mdstat and normally you would expect to see a progress bar of the new device being added, but nothing.
I am very confused and frustrated at this point and I desperately require those virtual machines to be intact! If there is anyone that could please help, I would greatly appreciate it.
,
Jared
- 06-24-2011 #2
OK, I am new to the whole software raid setup. Only been playing with it for a few months. But I did notice something in your output above. /dev/sda1 is the only device that is not showing faulty drive. Just trying to figure out how you came to the conclusion that sdd was the faulty drive?
Also you didn't say how your removed and added the device to the raid. Only thing I can think of right now is you didn't remove it properly thus the new device cannot be added to the raid.
I know when I had to fail a drive it seemed that everything got confused and I had to rebuild my setup.
This is a page I use for Quick Reference. At any rate I am interested in hearing how this gets resolved.
- 06-24-2011 #3Just Joined!
- Join Date
- Dec 2010
- Location
- Texas
- Posts
- 5
I failed the device and removed it using
As to how I came to the conclusion that /dev/sdd1 is the drive with the error, please refer to to my last post where I said,Code:mdadm --fail /dev/md2 /dev/sdxx mdadm --remove /dev/md2 /dev/sdxx
Is there a way to use --create to make a new device say.. /dev/md3 without trashing my data?As best I could discern, this was telling me that /dev/sdd1 was faulty. So I replaced it with a drive of the same size and was surprised to find that the system was telling me that now only 3/5 of the devices in the array could be contacted. I took this to indicate that /dev/sdd1 was not my problem. I started removing drives and trying to rebuild the array and each time I removed one, it would say it could contact 3/5 devices and the array could not be assembled. That is until I got down to /dev/sda1. When I removed it I was met with my original error: 4/5 devices in the array could not be contacted. This, contrary to what I had observed in mdadm --examine / mdadm --detail suggested that /dev/sda1 was the problem. I replaced that drive, removed /dev/sda1 from the array, re-added it and the array would still not build. I could cat /proc/mdstat and normally you would expect to see a progress bar of the new device being added, but nothing.
Thanks for your reply. Any advice/fresh eyes is appreciated!
- 06-25-2011 #4
I cannot truthfully answer this question as I do not know.
I was wondering if this command might help;
Do you have a backup of your data?Code:mdadm --assemble --force
Can understand and as I stated before I would like to know how to fix this issue myself.Thanks for your reply. Any advice/fresh eyes is appreciated!
Here is another site I found while searching;
Linux Recover From A Lost Software RAID device (Rebuild RAID 5 Software Array)
- 06-25-2011 #5
I was doing some more searching on the web and have come to the conclusion you might have to do the following steps:
1. Reboot the system
2. mdadm --stop /dev/md2
3. mdadm --assemble --force /dev/md2 /dev/hdd1 /dev/sdd1 /dev/sdc1 /dev/sdb1 /dev/sda1
- 06-27-2011 #6Just Joined!
- Join Date
- Dec 2010
- Location
- Texas
- Posts
- 5
No
The array isn't active. Stopping it has no effect.
- 06-28-2011 #7
OK, have your tried the other commands?


Reply With Quote
