Find the answer to your Linux question:
Results 1 to 4 of 4
I've had a four-drive, software RAID-5 array running for some time now. Recently, Ubuntu started complaining that one of the drives in the array was in danger of imminent failure. ...
  1. #1
    Just Joined!
    Join Date
    Dec 2010
    Posts
    2

    RAID 5 Problem

    I've had a four-drive, software RAID-5 array running for some time now. Recently, Ubuntu started complaining that one of the drives in the array was in danger of imminent failure. I tried spin-riting the drive, etc... but was not able to save it. When the computer would boot, I would receive an error message prior to the Ubuntu splash screen saying that the array had been started with 3 drives.

    I purchased a new drive to replace the failing one and today tried to swap the drives.

    First I marked the drive as failed:

    Code:
    mdadm --manage /dev/md0 --fail /dev/sdb1
    Then I removed it from the array:

    Code:
    mdadm --manage /dev/md0 --remove /dev/sdb1
    I then turned off the computer and replaced the bad drive with the new drive I just purchased (it's the same size drive). When I booted the comp, I again got the message that the array had been started with only three drives.

    I then partitioned my new drive and added it to the array:

    Code:
    mdadm --manage /dev/md0 --add /dev/sdb1
    At this point, I took a look at the /proc/mdstat file to check if the drive was synchronizing, and I noticed it was showing [__UU] as if two drives were having issues.

    I examined /dev/md0 and it had somehow marked /dev/sda1 as Faulty and set it to be a spare. I didn't think the synch would work correctly if that drive was marked as Faulty, so I stopped the array.

    Now, if I try to restart it with:

    Code:
    mdadm --assemble --scan --force
    I get:

    mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
    Looking at the /proc/mdstat now, it appears that /dev/sda1 is not even in the array any more and /dev/sdb1 (the new drive) is synching with the other two "good" drives:

    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md0 : inactive sdc1[2](S) sdb1[4](S) sdd1[3](S)
    1465151808 blocks
    If I run:

    Code:
    mdadm --examine /dev/sda1
    I get:

    mdadm: No md superblock detected on /dev/sda1.
    The other three drives look like:

    /dev/sdb1: (new drive)
    Magic : a92b4efc
    Version : 00.90.00
    UUID : 3c2f19bc:376ad7d9:ee2e6413:22e5808b
    Creation Time : Sat Feb 23 11:49:28 2008
    Raid Level : raid5
    Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
    Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
    Raid Devices : 4
    Total Devices : 4
    Preferred Minor : 0

    Update Time : Sun Dec 5 12:49:50 2010
    State : clean
    Active Devices : 2
    Working Devices : 3
    Failed Devices : 1
    Spare Devices : 1
    Checksum : dfc2038d - correct
    Events : 510970

    Layout : left-symmetric
    Chunk Size : 64K

    Number Major Minor RaidDevice State
    this 4 8 17 4 spare /dev/sdb1

    0 0 0 0 0 removed
    1 1 0 0 1 faulty removed
    2 2 8 33 2 active sync /dev/sdc1
    3 3 8 49 3 active sync /dev/sdd1
    4 4 8 17 4 spare /dev/sdb1

    /dev/sdb1:
    Magic : a92b4efc
    Version : 00.90.00
    UUID : 3c2f19bc:376ad7d9:ee2e6413:22e5808b
    Creation Time : Sat Feb 23 11:49:28 2008
    Raid Level : raid5
    Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
    Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
    Raid Devices : 4
    Total Devices : 4
    Preferred Minor : 0

    Update Time : Sun Dec 5 12:49:50 2010
    State : clean
    Active Devices : 2
    Working Devices : 3
    Failed Devices : 1
    Spare Devices : 1
    Checksum : dfc2038d - correct
    Events : 510970

    Layout : left-symmetric
    Chunk Size : 64K

    Number Major Minor RaidDevice State
    this 4 8 17 4 spare /dev/sdb1

    0 0 0 0 0 removed
    1 1 0 0 1 faulty removed
    2 2 8 33 2 active sync /dev/sdc1
    3 3 8 49 3 active sync /dev/sdd1
    4 4 8 17 4 spare /dev/sdb1

    /dev/sdb1:
    Magic : a92b4efc
    Version : 00.90.00
    UUID : 3c2f19bc:376ad7d9:ee2e6413:22e5808b
    Creation Time : Sat Feb 23 11:49:28 2008
    Raid Level : raid5
    Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
    Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
    Raid Devices : 4
    Total Devices : 4
    Preferred Minor : 0

    Update Time : Sun Dec 5 12:49:50 2010
    State : clean
    Active Devices : 2
    Working Devices : 3
    Failed Devices : 1
    Spare Devices : 1
    Checksum : dfc2038d - correct
    Events : 510970

    Layout : left-symmetric
    Chunk Size : 64K

    Number Major Minor RaidDevice State
    this 4 8 17 4 spare /dev/sdb1

    0 0 0 0 0 removed
    1 1 0 0 1 faulty removed
    2 2 8 33 2 active sync /dev/sdc1
    3 3 8 49 3 active sync /dev/sdd1
    4 4 8 17 4 spare /dev/sdb1
    At this point, I think I might be screwed... I believe the data on sda1, sdc1 and sdd1 should still contain the necessary data to reconstruct the array, so I MIGHT be able to recover by just creating a new array (md1). However, the synch between sdb1, sdc1 and sdd1 is still in progress so it won't let me try that while the drives are in use.

    Anyone have any ideas or am I totally screwed?


    Here's the fdisk on the drives in the array:

    Disk /dev/sdb: 500.1 GB, 500107862016 bytes
    255 heads, 63 sectors/track, 60801 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x00042c2d

    Device Boot Start End Blocks Id System
    /dev/sdb1 1 60801 488384001 fd Linux raid autodetect

    Disk /dev/sdc: 500.1 GB, 500107862016 bytes
    255 heads, 63 sectors/track, 60801 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x00000000

    Device Boot Start End Blocks Id System
    /dev/sdc1 1 60801 488384001 fd Linux raid autodetect

    Disk /dev/sdd: 500.1 GB, 500107862016 bytes
    255 heads, 63 sectors/track, 60801 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x00000000

    Device Boot Start End Blocks Id System
    /dev/sdd1 1 60801 488384001 fd Linux raid autodetect

  2. #2
    Linux User Manko10's Avatar
    Join Date
    Sep 2010
    Posts
    250
    Are you sure you formatted and repartitioned the right disk? The device names in /dev/ might have changed after replacing the disks.

    What you could try is
    Code:
    mdadm --assemble --force --scan
    If that doesn't work you could be in trouble because you can't recover RAID-5 if more than two disks failed.
    Refining Linux Advent calendar: “24 Outstanding ZSH Gems

  3. #3
    Just Joined!
    Join Date
    Dec 2010
    Posts
    2
    I'm pretty sure the right drive was formatted / partitioned. I looked at fdisk prior to doing anything and /dev/sdb was the drive that had no partitions on it.

    After rebooting, I tried running:

    Code:
    mdadm --assemble --scan --force
    This time it didn't return an error message. However, when I look at the /proc/mdstat file, I now see:

    md0 : inacive sdb1[2] sda1[4](S) sdc1[3]
    ... so now it looks like it's lost sdd1 somehow? And it's attempting to synch sda1. Ugh

  4. #4
    Just Joined!
    Join Date
    Dec 2010
    Posts
    1
    Hi all, Iam new to the forum.

    Few days ago, on one of my systems disk 'b' failed, the spear kicked in and at the same time disk 'a' failed.
    According to /var/log/messages

    at this stage i did a raidstat;
    __UUUUU 7/5 cdefg Spare: H Failed: AB
    Then i took both disks offline and back online, one by one
    with disk 'a' going first.

    raidstat ran again;
    __UUUUU 7/5 cdefg Spare: HAB
    Then for some reason the system crashed and was not responding.
    The system was rebooted via the power switch at front of panel.
    While the system was powered off......checkd for any loose cables and/or none working fans.
    All checked out ok.

    The system was brought back online.

    Did
    mdadm --detail /dev/md0
    mdadm: md device /dev/md0 does not appear to be active.
    ok so did
    lsraid -a /dev/md0
    lsraid: md device [dev 9, 0] /dev/md0 is offline: Please specify a disk to query
    so checked the statues of disks
    fdisk -l /dev/sd*
    Disk /dev/sda1: 246.7 GB, 246758367744 bytes
    255 heads, 63 sectors/track, 29999 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot Start End Blocks Id System
    /dev/sda1 * 1 30000 240974968+ fd Linux raid autodetect

    Disk /dev/sdb1: 246.7 GB, 246758367744 bytes
    255 heads, 63 sectors/track, 29999 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot Start End Blocks Id System
    /dev/sdb1 * 1 30000 240974968+ fd Linux raid autodetect

    and so on.......
    Examined a disk;
    mdadm --examine /dev/sda1

    /dev/sda1:
    Magic : a92b4efc
    Version : 00.90.00
    UUID : ea44baca:39b2ee82:16c505ad:aba8b7a8
    Creation Time : Wed Apr 19 17:37:23 2006
    Raid Level : raid5
    Used Dev Size : 240974848 (229.81 GiB 246.76 GB)
    Array Size : 1445849088 (1378.87 GiB 1480.55 GB)
    Raid Devices : 7
    Total Devices : 8
    Preferred Minor : 0

    Update Time : Thu Dec 2 08:59:56 2010
    State : active
    Active Devices : 5
    Working Devices : 8
    Failed Devices : 0
    Spare Devices : 3
    Checksum : 2f2baa26 - correct
    Events : 0.182

    Layout : left-symmetric
    Chunk Size : 64K

    Number Major Minor RaidDevice State
    this 9 8 1 9 spare /dev/sda1

    0 0 0 0 0 faulty removed
    1 1 0 0 1 faulty removed
    2 2 8 33 2 active sync /dev/sdc1
    3 3 8 49 3 active sync /dev/sdd1
    4 4 8 65 4 active sync /dev/sde1
    5 5 8 81 5 active sync /dev/sdf1
    6 6 8 97 6 active sync /dev/sdg1
    7 7 8 113 7 spare /dev/sdh1
    8 8 8 17 8 spare /dev/sdb1
    9 9 8 1 9 spare /dev/sda1



    /dev/sdb1:
    Magic : a92b4efc
    Version : 00.90.00
    UUID : ea44baca:39b2ee82:16c505ad:aba8b7a8
    Creation Time : Wed Apr 19 17:37:23 2006
    Raid Level : raid5
    Used Dev Size : 240974848 (229.81 GiB 246.76 GB)
    Array Size : 1445849088 (1378.87 GiB 1480.55 GB)
    Raid Devices : 7
    Total Devices : 8
    Preferred Minor : 0

    Update Time : Thu Dec 2 08:59:56 2010
    State : active
    Active Devices : 5
    Working Devices : 8
    Failed Devices : 0
    Spare Devices : 3
    Checksum : 2f2baa34 - correct
    Events : 0.182

    Layout : left-symmetric
    Chunk Size : 64K

    Number Major Minor RaidDevice State
    this 8 8 17 8 spare /dev/sdb1

    0 0 0 0 0 faulty removed
    1 1 0 0 1 faulty removed
    2 2 8 33 2 active sync /dev/sdc1
    3 3 8 49 3 active sync /dev/sdd1
    4 4 8 65 4 active sync /dev/sde1
    5 5 8 81 5 active sync /dev/sdf1
    6 6 8 97 6 active sync /dev/sdg1
    7 7 8 113 7 spare /dev/sdh1
    8 8 8 17 8 spare /dev/sdb1
    9 9 8 1 9 spare /dev/sda1


    /dev/sdc1:
    Magic : a92b4efc
    Version : 00.90.00
    UUID : ea44baca:39b2ee82:16c505ad:aba8b7a8
    Creation Time : Wed Apr 19 17:37:23 2006
    Raid Level : raid5
    Used Dev Size : 240974848 (229.81 GiB 246.76 GB)
    Array Size : 1445849088 (1378.87 GiB 1480.55 GB)
    Raid Devices : 7
    Total Devices : 8
    Preferred Minor : 0

    Update Time : Thu Dec 2 08:59:56 2010
    State : clean
    Active Devices : 5
    Working Devices : 8
    Failed Devices : 0
    Spare Devices : 3
    Checksum : 2f2baa3f - correct
    Events : 0.182

    Layout : left-symmetric
    Chunk Size : 64K

    Number Major Minor RaidDevice State
    this 2 8 33 2 active sync /dev/sdc1

    0 0 0 0 0 faulty removed
    1 1 0 0 1 faulty removed
    2 2 8 33 2 active sync /dev/sdc1
    3 3 8 49 3 active sync /dev/sdd1
    4 4 8 65 4 active sync /dev/sde1
    5 5 8 81 5 active sync /dev/sdf1
    6 6 8 97 6 active sync /dev/sdg1
    7 7 8 113 7 spare /dev/sdh1
    8 8 8 17 8 spare /dev/sdb1
    9 9 8 1 9 spare /dev/sda1
    Question....the State, uder uptime, for disk 'c' is clean and other two are active.....why is that? and were does it look or where it gets this info from?


    I have alos tried
    mdadm --assemble --force /dev/md0
    mdadm: /dev/md0 assembled from 5 drives and 1 spare - not enough to start the array.
    also tried to take a disk offline
    mdadm /dev/md0 -f /dev/sda1
    mdadm: cannot get array info for /dev/md0
    there are few things i can try:

    mdadm --assemble --scan --force

    or

    --assume-clean, normally safe but not recommended.

    or

    --re-add, to readd recently removed array


    before i go command crazy....any one have any ideas, on how to bring raid back online or has anyone had this problem.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...