Results 1 to 4 of 4
I've had a four-drive, software RAID-5 array running for some time now. Recently, Ubuntu started complaining that one of the drives in the array was in danger of imminent failure. ...
- 12-05-2010 #1Just Joined!
- Join Date
- Dec 2010
- Posts
- 2
RAID 5 Problem
I've had a four-drive, software RAID-5 array running for some time now. Recently, Ubuntu started complaining that one of the drives in the array was in danger of imminent failure. I tried spin-riting the drive, etc... but was not able to save it. When the computer would boot, I would receive an error message prior to the Ubuntu splash screen saying that the array had been started with 3 drives.
I purchased a new drive to replace the failing one and today tried to swap the drives.
First I marked the drive as failed:
Then I removed it from the array:Code:mdadm --manage /dev/md0 --fail /dev/sdb1
I then turned off the computer and replaced the bad drive with the new drive I just purchased (it's the same size drive). When I booted the comp, I again got the message that the array had been started with only three drives.Code:mdadm --manage /dev/md0 --remove /dev/sdb1
I then partitioned my new drive and added it to the array:
At this point, I took a look at the /proc/mdstat file to check if the drive was synchronizing, and I noticed it was showing [__UU] as if two drives were having issues.Code:mdadm --manage /dev/md0 --add /dev/sdb1
I examined /dev/md0 and it had somehow marked /dev/sda1 as Faulty and set it to be a spare. I didn't think the synch would work correctly if that drive was marked as Faulty, so I stopped the array.
Now, if I try to restart it with:
I get:Code:mdadm --assemble --scan --force
Looking at the /proc/mdstat now, it appears that /dev/sda1 is not even in the array any more and /dev/sdb1 (the new drive) is synching with the other two "good" drives:mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
If I run:Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdc1[2](S) sdb1[4](S) sdd1[3](S)
1465151808 blocks
I get:Code:mdadm --examine /dev/sda1
The other three drives look like:mdadm: No md superblock detected on /dev/sda1.
At this point, I think I might be screwed... I believe the data on sda1, sdc1 and sdd1 should still contain the necessary data to reconstruct the array, so I MIGHT be able to recover by just creating a new array (md1). However, the synch between sdb1, sdc1 and sdd1 is still in progress so it won't let me try that while the drives are in use./dev/sdb1: (new drive)
Magic : a92b4efc
Version : 00.90.00
UUID : 3c2f19bc:376ad7d9:ee2e6413:22e5808b
Creation Time : Sat Feb 23 11:49:28 2008
Raid Level : raid5
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Dec 5 12:49:50 2010
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : dfc2038d - correct
Events : 510970
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 17 4 spare /dev/sdb1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 17 4 spare /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 3c2f19bc:376ad7d9:ee2e6413:22e5808b
Creation Time : Sat Feb 23 11:49:28 2008
Raid Level : raid5
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Dec 5 12:49:50 2010
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : dfc2038d - correct
Events : 510970
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 17 4 spare /dev/sdb1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 17 4 spare /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 3c2f19bc:376ad7d9:ee2e6413:22e5808b
Creation Time : Sat Feb 23 11:49:28 2008
Raid Level : raid5
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Dec 5 12:49:50 2010
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : dfc2038d - correct
Events : 510970
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 17 4 spare /dev/sdb1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 17 4 spare /dev/sdb1
Anyone have any ideas or am I totally screwed?
Here's the fdisk on the drives in the array:
Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00042c2d
Device Boot Start End Blocks Id System
/dev/sdb1 1 60801 488384001 fd Linux raid autodetect
Disk /dev/sdc: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdc1 1 60801 488384001 fd Linux raid autodetect
Disk /dev/sdd: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdd1 1 60801 488384001 fd Linux raid autodetect
- 12-05-2010 #2
Are you sure you formatted and repartitioned the right disk? The device names in /dev/ might have changed after replacing the disks.
What you could try isIf that doesn't work you could be in trouble because you can't recover RAID-5 if more than two disks failed.Code:mdadm --assemble --force --scan
Refining Linux Advent calendar: “24 Outstanding ZSH Gems”
- 12-05-2010 #3Just Joined!
- Join Date
- Dec 2010
- Posts
- 2
I'm pretty sure the right drive was formatted / partitioned. I looked at fdisk prior to doing anything and /dev/sdb was the drive that had no partitions on it.
After rebooting, I tried running:
This time it didn't return an error message. However, when I look at the /proc/mdstat file, I now see:Code:mdadm --assemble --scan --force
... so now it looks like it's lost sdd1 somehow? And it's attempting to synch sda1. Ughmd0 : inacive sdb1[2] sda1[4](S) sdc1[3]
- 12-09-2010 #4Just Joined!
- Join Date
- Dec 2010
- Posts
- 1
Hi all, Iam new to the forum.
Few days ago, on one of my systems disk 'b' failed, the spear kicked in and at the same time disk 'a' failed.
According to /var/log/messages
at this stage i did a raidstat;
Then i took both disks offline and back online, one by one__UUUUU 7/5 cdefg Spare: H Failed: AB
with disk 'a' going first.
raidstat ran again;
Then for some reason the system crashed and was not responding.__UUUUU 7/5 cdefg Spare: HAB
The system was rebooted via the power switch at front of panel.
While the system was powered off......checkd for any loose cables and/or none working fans.
All checked out ok.
The system was brought back online.
Did
ok so didmdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
so checked the statues of diskslsraid -a /dev/md0
lsraid: md device [dev 9, 0] /dev/md0 is offline: Please specify a disk to query
Examined a disk;fdisk -l /dev/sd*
Disk /dev/sda1: 246.7 GB, 246758367744 bytes
255 heads, 63 sectors/track, 29999 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 30000 240974968+ fd Linux raid autodetect
Disk /dev/sdb1: 246.7 GB, 246758367744 bytes
255 heads, 63 sectors/track, 29999 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 30000 240974968+ fd Linux raid autodetect
and so on.......
Question....the State, uder uptime, for disk 'c' is clean and other two are active.....why is that? and were does it look or where it gets this info from?mdadm --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : ea44baca:39b2ee82:16c505ad:aba8b7a8
Creation Time : Wed Apr 19 17:37:23 2006
Raid Level : raid5
Used Dev Size : 240974848 (229.81 GiB 246.76 GB)
Array Size : 1445849088 (1378.87 GiB 1480.55 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Thu Dec 2 08:59:56 2010
State : active
Active Devices : 5
Working Devices : 8
Failed Devices : 0
Spare Devices : 3
Checksum : 2f2baa26 - correct
Events : 0.182
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 9 8 1 9 spare /dev/sda1
0 0 0 0 0 faulty removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 113 7 spare /dev/sdh1
8 8 8 17 8 spare /dev/sdb1
9 9 8 1 9 spare /dev/sda1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : ea44baca:39b2ee82:16c505ad:aba8b7a8
Creation Time : Wed Apr 19 17:37:23 2006
Raid Level : raid5
Used Dev Size : 240974848 (229.81 GiB 246.76 GB)
Array Size : 1445849088 (1378.87 GiB 1480.55 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Thu Dec 2 08:59:56 2010
State : active
Active Devices : 5
Working Devices : 8
Failed Devices : 0
Spare Devices : 3
Checksum : 2f2baa34 - correct
Events : 0.182
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 8 8 17 8 spare /dev/sdb1
0 0 0 0 0 faulty removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 113 7 spare /dev/sdh1
8 8 8 17 8 spare /dev/sdb1
9 9 8 1 9 spare /dev/sda1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : ea44baca:39b2ee82:16c505ad:aba8b7a8
Creation Time : Wed Apr 19 17:37:23 2006
Raid Level : raid5
Used Dev Size : 240974848 (229.81 GiB 246.76 GB)
Array Size : 1445849088 (1378.87 GiB 1480.55 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Thu Dec 2 08:59:56 2010
State : clean
Active Devices : 5
Working Devices : 8
Failed Devices : 0
Spare Devices : 3
Checksum : 2f2baa3f - correct
Events : 0.182
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 33 2 active sync /dev/sdc1
0 0 0 0 0 faulty removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 113 7 spare /dev/sdh1
8 8 8 17 8 spare /dev/sdb1
9 9 8 1 9 spare /dev/sda1
I have alos tried
also tried to take a disk offlinemdadm --assemble --force /dev/md0
mdadm: /dev/md0 assembled from 5 drives and 1 spare - not enough to start the array.
there are few things i can try:mdadm /dev/md0 -f /dev/sda1
mdadm: cannot get array info for /dev/md0
mdadm --assemble --scan --force
or
--assume-clean, normally safe but not recommended.
or
--re-add, to readd recently removed array
before i go command crazy....any one have any ideas, on how to bring raid back online or has anyone had this problem.


Reply With Quote