Find the answer to your Linux question:
Results 1 to 2 of 2
Hi I have experienced a failure in one disk in my 4 disk software RAID10 setup, but a straightforward rebuild is thwarted by the fact that two of the other ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    mqk
    mqk is offline
    Just Joined!
    Join Date
    Aug 2010
    Posts
    2

    [SOLVED] RAID10 trouble - disk failure plus non-fresh disks


    Hi

    I have experienced a failure in one disk in my 4 disk software RAID10
    setup, but a straightforward rebuild is thwarted by the fact that two
    of the other disks are considered non-fresh and hence get kicked out
    of the array. This of course prevents the array from starting up.

    Here's more detail about my setup:

    I have 4 2TB SATA disks in RAID10.

    Device Boot Start End Blocks Id System
    /dev/sda1 1 243201 1953512001 fd Linux raid autodetect
    /dev/sdc1 1 243201 1953512001 fd Linux raid autodetect
    /dev/sdd1 1 243201 1953512001 fd Linux raid autodetect
    /dev/sde1 1 243201 1953512001 fd Linux raid autodetect

    I have another disk (/dev/sdb) that hosts the operating system, so I can boot up and work on this machine directly.


    $ sudo mdadm --detail /dev/md0
    /dev/md0:
    Version : 00.90
    Creation Time : Thu Nov 5 16:44:06 2009
    Raid Level : raid10
    Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
    Raid Devices : 4
    Total Devices : 1
    Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Nov 16 10:24:11 2010
    State : active, degraded, Not Started
    Active Devices : 1
    Working Devices : 1
    Failed Devices : 0
    Spare Devices : 0

    Layout : near=2, far=1
    Chunk Size : 64K

    UUID : e0c049f7:658d2514:6fcc1897:bafeb8ef (local to host xxx.xxx.xxx)
    Events : 0.8646

    Number Major Minor RaidDevice State
    0 8 1 0 active sync /dev/sda1
    1 0 0 1 removed
    2 0 0 2 removed
    3 0 0 3 removed


    At first I thought that this meant that I had experienced simultaneous
    disk failures in *three* of my 4 disks, but I've now run 'smartctl -t
    short' on all four disks, and it "Completed without error" for 3 out
    of the 4 disks. Only /dev/sdd1 gave a "read failure", which gives me
    hope that perhaps it's only that one disk that is borked, and I can
    still rebuild my RAID array using a replacement disk.

    My problem is that in order to add the replacement disk and rebuild
    the array, I first need the array to come up in degraded, but active
    mode. In /var/log/syslog I see 'md: kicking non-fresh sdc1 from
    array!' and the same for sde1. I've read in other threads that people
    typically just re-add the non-fresh disks, but I have two concerns:

    a) I worry that by re-adding, say, /dev/sdc (i.e. one of the non-fresh
    disks) data on this disk that is necessary to rebuild /dev/sdd (the
    failed disk) will be overwritten. Is this a possibility?

    b) I don't see how data on /dev/sda (the only remaining active) disk
    will be sufficient to re-add /dev/sdc and /dev/sde. What I want to
    do is to add the /dev/sdc and /dev/sde back into the array
    *without* re-building it. It seems that --re-add (as opposed to
    --add) is supposed to that, but I'm unsure about that.

    I'm proceeding very cautiously here, as I still harbor hopes of
    recovering all the data, and I don't want to mess something up now. In
    fact, at the moment I'm creating an exact clone of /dev/sde (using dd
    -- it's taking forever, 17 hours and only ~3/4 done...), so I can
    experiment with the --re-add on this disk and still have an intact
    copy of it, in case something goes wrong.


    Any advice greatly appreciated!


    Mike

  2. #2
    mqk
    mqk is offline
    Just Joined!
    Join Date
    Aug 2010
    Posts
    2
    Crickets... Maybe I posted to the wrong forum?

    For the record, I was able to fix my RAID. Here's what I did. After the backup cloning of /dev/sde finished (took close to 24 hours), I did

    $ sudo mdadm /dev/md0 --re-add /dev/sde

    followed by

    $ sudo mdadm /dev/md0 --run

    and my RAID was back in business (in a degraded state of course). I then rebooted and the second non-fresh disk (/dev/sdc) was automatically added and started rebuilding. It's a bit odd that it needed to be rebuilt, since there was nothing wrong with that disk at all. Maybe I should have just re-added it together with /dev/sde. That would probably have saved the rebuilding.

    I've now added the replacement disk (/dev/sdd) and it has successfully been rebuilt too. My RAID array is back in full working order. Yay!

    Mike

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •