Find the answer to your Linux question:
Results 1 to 7 of 7
I have an ssh session open to a remote debian squeeze server that has 3 drives configured as 2 separate software raids. I believe it is very possible that 2 ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jan 2012
    Posts
    5

    Raid 5 State : clean, FAILED


    I have an ssh session open to a remote debian squeeze server that has 3 drives configured as 2 separate software raids. I believe it is very possible that 2 of the drives have failed.

    Code:
    # mdadm --detail /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Mon Feb 21 17:01:43 2011
         Raid Level : raid1
         Array Size : 975860 (953.15 MiB 999.28 MB)
      Used Dev Size : 975860 (953.15 MiB 999.28 MB)
       Raid Devices : 2
      Total Devices : 3
        Persistence : Superblock is persistent
    
        Update Time : Tue Jan 17 12:33:52 2012
              State : clean, degraded
     Active Devices : 1
    Working Devices : 2
     Failed Devices : 1
      Spare Devices : 1
    
        Number   Major   Minor   RaidDevice State
           0       0        0        0      removed
           1       8       17        1      active sync   /dev/sdb1
    
           0       8        1        -      faulty spare   /dev/sda1
           2       8       33        -      spare   /dev/sdc1
    and

    Code:
    # mdadm --detail /dev/md1
    /dev/md1:
            Version : 1.2
      Creation Time : Mon Feb 21 17:01:49 2011
         Raid Level : raid5
         Array Size : 1951564800 (1861.16 GiB 1998.40 GB)
      Used Dev Size : 975782400 (930.58 GiB 999.20 GB)
       Raid Devices : 3
      Total Devices : 3
        Persistence : Superblock is persistent
    
        Update Time : Tue Jan 17 13:32:40 2012
              State : clean, FAILED
     Active Devices : 1
    Working Devices : 1
     Failed Devices : 2
      Spare Devices : 0
    
             Layout : left-symmetric
         Chunk Size : 512K
    
               Name : westlund2:1  (local to host westlund2)
               UUID : 705a71ba:dc848b5b:f64ee011:287606b1
             Events : 342
    
        Number   Major   Minor   RaidDevice State
           0       0        0        0      removed
           1       0        0        1      removed
           2       8       34        2      active sync   /dev/sdc2
    
           0       8        2        -      faulty spare   /dev/sda2
           1       8       18        -      faulty spare   /dev/sdb2
    LVM is configured on top of the raid setups. The server is running, but crippled. I'm pretty sure that the root filesystem is mounted on a logical volume on /dev/md0 but lvdisplay gives me "Input/output error" when trying to verify. Since /dev/md0 is degraded it makes sense that the system is still able to run, but that perhaps it has switched to read-only.

    Code:
    # mount
    /dev/mapper/raid-root on / type ext4 (rw,errors=remount-ro)
    tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
    proc on /proc type proc (rw,noexec,nosuid,nodev)
    sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
    udev on /dev type tmpfs (rw,mode=0755)
    tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
    devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
    /dev/md0 on /boot type ext3 (rw)
    xenfs on /proc/xen type xenfs (rw)
    /dev/mapper/raid-mail--disk on /mnt/mail type ext3 (rw)
    
    mount: warning: /etc/mtab is not writable (e.g. read-only filesystem).
           It's possible that information reported by mount(8) is not
           up to date. For actual information about system mount points
           check the /proc/mounts file.
    I do not have smartmontools installed to test the drives and cannot install them. I believe if I can get /dev/md0 out of degraded mode I could remount it read/write and install them?

    Looking for advise on how to carefully proceed. This is getting a bit more low level than I am used to working.

    Thanks,

    Steve

  2. #2
    Just Joined!
    Join Date
    Jan 2012
    Posts
    5
    It seems that superblocks are missing for some reason

    Code:
    # mdadm -E /dev/sda1
    mdadm: No md superblock detected on /dev/sda1.
    
    # mdadm -E /dev/sdb1
    mdadm: No md superblock detected on /dev/sdb1.
    
    # mdadm -E /dev/sdc1
    /dev/sdc1:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 049d8c72:e669e0b3:4e427ff5:9c45fa70
               Name : server1:0  (local to host server1)
      Creation Time : Mon Feb 21 17:01:43 2011
         Raid Level : raid1
       Raid Devices : 2
    
     Avail Dev Size : 1951720 (953.15 MiB 999.28 MB)
         Array Size : 1951720 (953.15 MiB 999.28 MB)
        Data Offset : 24 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : cf258f35:b6c06ebb:375a74fe:b02c2983
    
        Update Time : Tue Jan 17 14:08:11 2012
           Checksum : 1e75b738 - correct
             Events : 362
    
    
       Device Role : spare
       Array State : .A ('A' == active, '.' == missing)
    root@westlund2:~#
    When I try to add /dev/sdc1 back to /dev/md0 i get an error.

    Code:
    # mdadm --add /dev/md0 /dev/sdc1
    mdadm: cannot find valid superblock in this array - HELP
    I believe this is because the only active device in /dev/md0 is /dev/sdb1 and it is missing its superblock

    Code:
    # mdadm --detail /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Mon Feb 21 17:01:43 2011
         Raid Level : raid1
         Array Size : 975860 (953.15 MiB 999.28 MB)
      Used Dev Size : 975860 (953.15 MiB 999.28 MB)
       Raid Devices : 2
      Total Devices : 1
        Persistence : Superblock is persistent
    
        Update Time : Tue Jan 17 15:05:24 2012
              State : clean, degraded
     Active Devices : 1
    Working Devices : 1
     Failed Devices : 0
      Spare Devices : 0
    
        Number   Major   Minor   RaidDevice State
           0       0        0        0      removed
           1       8       17        1      active sync   /dev/sdb1
    Is there a way to re-initialize the superblock for /dev/sdb1 based on the superblock from /dev/sdc1?

    TIA

  3. #3
    Just Joined!
    Join Date
    Jan 2012
    Posts
    5
    Could still use some advise on this. Am I not posting the information needed? Right now it seems to me I need to try to re-initialize the superblock for /dev/sdb1. Is there a way I could manually configure the raid that would eliminate the need to have a valid superblock? I just would like to try to save some data from the raid 5 (may be impossible at this point anyway?) before tearing this server down.

    TIA

  4. $spacer_open
    $spacer_close
  5. #4
    drl
    drl is online now
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Slackware, {Free, Open, Net}BSD, Solaris
    Posts
    1,294
    Hi.

    I have been watching this thread, but I have no useful advice for your recovery issue. Here are some random thoughts.

    I usually use only RAID1.

    I do see a reference:
    With larger drive capacities the odds of a drive failure during rebuild are not negligible. In that event, the difficulty of extracting data from a failed array must be considered. Only a RAID 1 (mirror) stores all data on each drive in the array. Although it may depend on the controller, some individual drives in a RAID 1 can be read as a single conventional drive; this means a damaged RAID 1 can often be easily recovered if at least one component drive is in working condition. If the damage is more severe, some or all data can often be recovered by professional data recovery specialists. However, other RAID levels (like RAID level 5) present much more formidable obstacles to data recovery.

    ...

    Given a RAID with only one drive of redundancy (RAIDs 3, 4, and 5), a second failure would cause complete failure of the array. Even though individual drives' mean time between failure (MTBF) have increased over time, this increase has not kept pace with the increased storage capacity of the drives. The time to rebuild the array after a single drive failure, as well as the chance of a second failure during a rebuild, have increased over time.[38]

    -- excerpt from RAID - Wikipedia, the free encyclopedia
    Presumably the RAID5 is where your data resides, and the RAID1 is for the basic system. The excerpt above looks discouraging.

    If there are 2 failed disks, I don't see how recovery of the data array could happen -- because 1/2 of the the interleaved blocks of data would inaccessible. I suppose that recovery would depend on how bad the failure really is -- the data is not erased, just not easily accessible. If the blocks can still be read, an immediate copy to a good device would allow experimentation with tools, or, in the worst case, provide the disk to a recovery service, expensive, but often necessary if there is no backup extant, and no other way to reproduce the data.

    The addition of LVM on top of the RAID is another level of complication. I have one machine like that, and I am trying that as an experiment for live backup -- the backup tool [ rsnapshot ] sees that it is backing up an LVM partition, so it creates a shadow copy-on-write partition while doing the backup. So far it has seemed to be working well.

    Regarding the mount-as-read-only aspect, I have one such RAID1 partition:
    Code:
    Personalities : [raid1] 
    md4 : active raid1 sda8[0]
          55649024 blocks [2/1] [U_]
    However, it has not been remounted read-only:
    Code:
    % ls -ld /
    drwxr-xr-x 25 root root 4096 Nov 10 17:08 //
    I am waiting to replace this disk with a new disk that I already have on hand, but I want to upgrade the level of the OS, so I am testing on a separate machine as a virtual machine.

    Using software RAID is more flexible than hardware RAID, and if you machine is fast enough for the calculations, software RAID 6 seems like a better solution for the future.

    Good luck ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  6. #5
    Just Joined!
    Join Date
    Jan 2012
    Posts
    5
    I was actually able to get data off the raid 5 yesterday. I was hoping that would be the case when I saw the Raid 1 boot partition was running on one drive and the raid 5 showed that it only had one good disk left but it was a different disk.

    I couldn't find any way to remount the root filesystem read write so eventually I gave up and rebooted. Ended up in busybox and was able to recreate the raid 1 using

    Code:
    mdadm --create --assume-clean --level=1 --raid-devices=2 /dev/md0 /dev/sdb1 /dev/sdb2
    after finishing the boot and installing smartmontools I was able to diagnose that sda was failing but sdb and sdc checked out OK. I ran the same mdadm create but for the raid 5 and it seemed to work. After recreating /dev/md1 I was able to lvchange -a -y to activate the logical volumes, mount them, and grab the data off.

    I'm not sure how the superblocks were destroyed on 4 of the 6 partitions. I can't believe --assume-clean worked on both raid configurations. The only thing I can think of is that having it configured to mark the filesystem read-only on failure allowed both raid volumes to stay clean till I found them over a week later.

    Note to self, when I set this server back up, monitor it with something like zabbix!

  7. #6
    drl
    drl is online now
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Slackware, {Free, Open, Net}BSD, Solaris
    Posts
    1,294
    Hi.

    Thanks for having posted your action on this problem. It may help someone else in the future with similar troubles.

    Are you also planning a backup as well as monitoring? ... cheers, drl

    ( edit 1: grammar )
    Last edited by drl; 01-20-2012 at 03:36 PM.
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  8. #7
    Just Joined!
    Join Date
    Jan 2012
    Posts
    5
    no, a backup would just make entirely too much sense

    ok, maybe a backup

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •