Find the answer to your Linux question:
Results 1 to 7 of 7
Hello, I've been having trouble with a failed RAID for the past couple days and I have tried everything I can think of to fix the issue to no avail. ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Dec 2010
    Location
    Texas
    Posts
    5

    Exclamation RAID 5 mdadm help.


    Hello,

    I've been having trouble with a failed RAID for the past couple days and I have tried everything I can think of to fix the issue to no avail. I was hoping to find someone who could help.

    I have a Cent 5.4 box with a RAID 5 interface on /dev/md2 consisting of the following volumes.

    Code:
    hdd1[0] sdd1[4] sdc1[3] sdb1[2] sda1[1
    I had an outage and unknowingly a bad UPS battery so the server's power was interrupted. When the device was rebooted, I was met with the following error:
    Code:
    fsck.ext3: No such file or directory while trying to open /dev/lvm/lvm-raid
    This device is created when /dev/md2 is created by mdadm and mounted with my fstab. I turned my attention to /proc/mdstat and found the following:

    Code:
    Personalities : [raid1] [raid6] [raid5] [raid4]
    
    md2 : inactive hdd1[0] sdd1[4] sdc1[3] sdb1[2] sda1[1]
          3662859520 blocks
    I attempted to assemble the raid and found that only 4/5 of the drives could be contacted. I rebooted the device and looked in the bios to find that all 5 drives in this array were showing up. I could --force the array to assemble and it would use 4/5 drives, but of course my data was nowhere to be found. The directory this creates, /data, contains my virtual machines. I saw the directory and it's sub-directories, but there were no files.

    I used mdadm --examine --scan and compared that information to my mdadm.conf and they were a match. The devices that mdadm.conf links to that array are in /dev. The system sees them. Anyway, I decided to examine the individual devices. I will provide you with the output of each device now.

    Code:
    /dev/hdd1:
              Magic : a92b4efc
            Version : 0.90.00
               UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574
      Creation Time : Wed Jan 16 23:03:44 2008
         Raid Level : raid5
      Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
         Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
       Raid Devices : 5
      Total Devices : 4
    Preferred Minor : 2
    
        Update Time : Wed Jun 22 07:40:42 2011
              State : clean
     Active Devices : 4
    Working Devices : 4
     Failed Devices : 1
      Spare Devices : 0
           Checksum : 75bb48fb - correct
             Events : 15504911
    
             Layout : left-symmetric
         Chunk Size : 256K
    
          Number   Major   Minor   RaidDevice State
    this     0      22       65        0      active sync   /dev/hdd1
    
       0     0      22       65        0      active sync   /dev/hdd1
       1     1       0        0        1      faulty removed
       2     2       8        1        2      active sync   /dev/sda1
       3     3       8       17        3      active sync   /dev/sdb1
       4     4       8       33        4      active sync   /dev/sdc1
    Code:
    /dev/sda1:
              Magic : a92b4efc
            Version : 0.90.00
               UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574
      Creation Time : Wed Jan 16 23:03:44 2008
         Raid Level : raid5
      Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
         Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
       Raid Devices : 5
      Total Devices : 5
    Preferred Minor : 2
    
        Update Time : Sun May 23 08:16:22 2010
              State : clean
     Active Devices : 5
    Working Devices : 5
     Failed Devices : 0
      Spare Devices : 0
           Checksum : 72e0f6d6 - correct
             Events : 884248
    
             Layout : left-symmetric
         Chunk Size : 256K
    
          Number   Major   Minor   RaidDevice State
    this     1       8        1        1      active sync   /dev/sda1
    
       0     0       8       17        0      active sync   /dev/sdb1
       1     1       8        1        1      active sync   /dev/sda1
       2     2       8       65        2      active sync
       3     3       8       49        3      active sync   /dev/sdd1
       4     4       8       33        4      active sync   /dev/sdc1
    Code:
    /dev/sdb1:
              Magic : a92b4efc
            Version : 0.90.00
               UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574
      Creation Time : Wed Jan 16 23:03:44 2008
         Raid Level : raid5
      Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
         Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
       Raid Devices : 5
      Total Devices : 4
    Preferred Minor : 2
    
        Update Time : Wed Jun 22 07:40:42 2011
              State : active
     Active Devices : 4
    Working Devices : 4
     Failed Devices : 1
      Spare Devices : 0
           Checksum : 75bb48b0 - correct
             Events : 15504911
    
             Layout : left-symmetric
         Chunk Size : 256K
    
          Number   Major   Minor   RaidDevice State
    this     2       8        1        2      active sync   /dev/sda1
    
       0     0      22       65        0      active sync   /dev/hdd1
       1     1       0        0        1      faulty removed
       2     2       8        1        2      active sync   /dev/sda1
       3     3       8       17        3      active sync   /dev/sdb1
       4     4       8       33        4      active sync   /dev/sdc1
    Code:
    /dev/sdc1:
              Magic : a92b4efc
            Version : 0.90.00
               UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574
      Creation Time : Wed Jan 16 23:03:44 2008
         Raid Level : raid5
      Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
         Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
       Raid Devices : 5
      Total Devices : 4
    Preferred Minor : 2
    
        Update Time : Wed Jun 22 07:40:42 2011
              State : active
     Active Devices : 4
    Working Devices : 4
     Failed Devices : 1
      Spare Devices : 0
           Checksum : 75bb48c2 - correct
             Events : 15504911
    
             Layout : left-symmetric
         Chunk Size : 256K
    
          Number   Major   Minor   RaidDevice State
    this     3       8       17        3      active sync   /dev/sdb1
    
       0     0      22       65        0      active sync   /dev/hdd1
       1     1       0        0        1      faulty removed
       2     2       8        1        2      active sync   /dev/sda1
       3     3       8       17        3      active sync   /dev/sdb1
       4     4       8       33        4      active sync   /dev/sdc1
    Code:
    /dev/sdd1:
              Magic : a92b4efc
            Version : 0.90.00
               UUID : df5a74c5:f9bcfbb8:d10cfed7:60400574
      Creation Time : Wed Jan 16 23:03:44 2008
         Raid Level : raid5
      Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
         Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
       Raid Devices : 5
      Total Devices : 4
    Preferred Minor : 2
    
        Update Time : Wed Jun 22 07:40:42 2011
              State : active
     Active Devices : 4
    Working Devices : 4
     Failed Devices : 1
      Spare Devices : 0
           Checksum : 75bb48d4 - correct
             Events : 15504911
    
             Layout : left-symmetric
         Chunk Size : 256K
    
          Number   Major   Minor   RaidDevice State
    this     4       8       33        4      active sync   /dev/sdc1
    
       0     0      22       65        0      active sync   /dev/hdd1
       1     1       0        0        1      faulty removed
       2     2       8        1        2      active sync   /dev/sda1
       3     3       8       17        3      active sync   /dev/sdb1
       4     4       8       33        4      active sync   /dev/sdc1

    As best I could discern, this was telling me that /dev/sdd1 was faulty. So I replaced it with a drive of the same size and was surprised to find that the system was telling me that now only 3/5 of the devices in the array could be contacted. I took this to indicate that /dev/sdd1 was not my problem. I started removing drives and trying to rebuild the array and each time I removed one, it would say it could contact 3/5 devices and the array could not be assembled. That is until I got down to /dev/sda1. When I removed it I was met with my original error: 4/5 devices in the array could not be contacted. This, contrary to what I had observed in mdadm --examine / mdadm --detail suggested that /dev/sda1 was the problem. I replaced that drive, removed /dev/sda1 from the array, re-added it and the array would still not build. I could cat /proc/mdstat and normally you would expect to see a progress bar of the new device being added, but nothing.

    I am very confused and frustrated at this point and I desperately require those virtual machines to be intact! If there is anyone that could please help, I would greatly appreciate it.


    ,

    Jared

  2. #2
    Linux Guru Lazydog's Avatar
    Join Date
    Jun 2004
    Location
    The Keystone State
    Posts
    2,677
    OK, I am new to the whole software raid setup. Only been playing with it for a few months. But I did notice something in your output above. /dev/sda1 is the only device that is not showing faulty drive. Just trying to figure out how you came to the conclusion that sdd was the faulty drive?

    Also you didn't say how your removed and added the device to the raid. Only thing I can think of right now is you didn't remove it properly thus the new device cannot be added to the raid.

    I know when I had to fail a drive it seemed that everything got confused and I had to rebuild my setup.

    This is a page I use for Quick Reference. At any rate I am interested in hearing how this gets resolved.

    Regards
    Robert

    Linux
    The adventure of a life time.

    Linux User #296285
    Get Counted

  3. #3
    Just Joined!
    Join Date
    Dec 2010
    Location
    Texas
    Posts
    5

    Lightbulb

    I failed the device and removed it using

    Code:
    mdadm --fail /dev/md2 /dev/sdxx
    
    mdadm --remove /dev/md2 /dev/sdxx
    As to how I came to the conclusion that /dev/sdd1 is the drive with the error, please refer to to my last post where I said,

    As best I could discern, this was telling me that /dev/sdd1 was faulty. So I replaced it with a drive of the same size and was surprised to find that the system was telling me that now only 3/5 of the devices in the array could be contacted. I took this to indicate that /dev/sdd1 was not my problem. I started removing drives and trying to rebuild the array and each time I removed one, it would say it could contact 3/5 devices and the array could not be assembled. That is until I got down to /dev/sda1. When I removed it I was met with my original error: 4/5 devices in the array could not be contacted. This, contrary to what I had observed in mdadm --examine / mdadm --detail suggested that /dev/sda1 was the problem. I replaced that drive, removed /dev/sda1 from the array, re-added it and the array would still not build. I could cat /proc/mdstat and normally you would expect to see a progress bar of the new device being added, but nothing.
    Is there a way to use --create to make a new device say.. /dev/md3 without trashing my data?

    Thanks for your reply. Any advice/fresh eyes is appreciated!

  4. #4
    Linux Guru Lazydog's Avatar
    Join Date
    Jun 2004
    Location
    The Keystone State
    Posts
    2,677
    Quote Originally Posted by jaredmtucker View Post
    Is there a way to use --create to make a new device say.. /dev/md3 without trashing my data?
    I cannot truthfully answer this question as I do not know.

    I was wondering if this command might help;

    Code:
    mdadm --assemble --force
    Do you have a backup of your data?

    Thanks for your reply. Any advice/fresh eyes is appreciated!
    Can understand and as I stated before I would like to know how to fix this issue myself.

    Here is another site I found while searching;

    Linux Recover From A Lost Software RAID device (Rebuild RAID 5 Software Array)

    Regards
    Robert

    Linux
    The adventure of a life time.

    Linux User #296285
    Get Counted

  5. #5
    Linux Guru Lazydog's Avatar
    Join Date
    Jun 2004
    Location
    The Keystone State
    Posts
    2,677
    I was doing some more searching on the web and have come to the conclusion you might have to do the following steps:

    1. Reboot the system

    2. mdadm --stop /dev/md2

    3. mdadm --assemble --force /dev/md2 /dev/hdd1 /dev/sdd1 /dev/sdc1 /dev/sdb1 /dev/sda1

    Regards
    Robert

    Linux
    The adventure of a life time.

    Linux User #296285
    Get Counted

  6. #6
    Just Joined!
    Join Date
    Dec 2010
    Location
    Texas
    Posts
    5

    No

    The array isn't active. Stopping it has no effect.

  7. #7
    Linux Guru Lazydog's Avatar
    Join Date
    Jun 2004
    Location
    The Keystone State
    Posts
    2,677
    OK, have your tried the other commands?

    Regards
    Robert

    Linux
    The adventure of a life time.

    Linux User #296285
    Get Counted

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •