Find the answer to your Linux question:
Results 1 to 3 of 3
Hi All, Had a couple of goes at this, decided it might be smarter to ask around a bit before I do anything drastic. Had some recent issues with a ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jun 2011
    Posts
    2

    Unhappy spare not rebuilding?


    Hi All,

    Had a couple of goes at this, decided it might be smarter to ask around a bit before I do anything drastic.

    Had some recent issues with a server - Ubuntu 9.10, running in a hosting center (so I have no direct hardware access).

    smartmon showed some issues, so I triggered a short test - and the drive bugged out, taking down the whole server (which worried me a bit as this is supposed to be raid 1)

    rebooting saw only one of the two drives, and the main data partition was at state "spare" and was not available for service in degraded state.

    hardware reset brought the original back to life, so I checked and it was in the rebuilding state:

    Code:
    /dev/md2:
            Version : 00.90
      Creation Time : Wed Feb 17 00:03:20 2010
         Raid Level : raid1
         Array Size : 1458830400 (1391.25 GiB 1493.84 GB)
      Used Dev Size : 1458830400 (1391.25 GiB 1493.84 GB)
       Raid Devices : 2
      Total Devices : 2
    Preferred Minor : 2
        Persistence : Superblock is persistent
    
        Update Time : Mon Jun 27 17:25:19 2011
              State : active, degraded
     Active Devices : 1
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 1
    
               UUID : e5e65dac:b388bfd1:776c2c25:004bd7b2
             Events : 0.2946605
    
        Number   Major   Minor   RaidDevice State
           0       8        3        0      active sync   /dev/sda3
           2       8       19        1      spare rebuilding   /dev/sdb3
    
    ========================
    /dev/sda3:
              Magic : a92b4efc
            Version : 00.90.00
               UUID : e5e65dac:b388bfd1:776c2c25:004bd7b2
      Creation Time : Wed Feb 17 00:03:20 2010
         Raid Level : raid1
      Used Dev Size : 1458830400 (1391.25 GiB 1493.84 GB)
         Array Size : 1458830400 (1391.25 GiB 1493.84 GB)
       Raid Devices : 2
      Total Devices : 2
    Preferred Minor : 2
    
        Update Time : Mon Jun 27 17:25:19 2011
              State : active
     Active Devices : 1
    Working Devices : 2
     Failed Devices : 1
      Spare Devices : 1
           Checksum : aaf72889 - correct
             Events : 2946605
    
    
          Number   Major   Minor   RaidDevice State
    this     0       8        3        0      active sync   /dev/sda3
    
       0     0       8        3        0      active sync   /dev/sda3
       1     1       0        0        1      faulty removed
       2     2       8       19        2      spare   /dev/sdb3
    ========================
    
    /dev/sdb3:
              Magic : a92b4efc
            Version : 00.90.00
               UUID : e5e65dac:b388bfd1:776c2c25:004bd7b2
      Creation Time : Wed Feb 17 00:03:20 2010
         Raid Level : raid1
      Used Dev Size : 1458830400 (1391.25 GiB 1493.84 GB)
         Array Size : 1458830400 (1391.25 GiB 1493.84 GB)
       Raid Devices : 2
      Total Devices : 2
    Preferred Minor : 2
    
        Update Time : Mon Jun 27 17:25:19 2011
              State : active
     Active Devices : 1
    Working Devices : 2
     Failed Devices : 1
      Spare Devices : 1
           Checksum : aaf72897 - correct
             Events : 2946605
    
    
          Number   Major   Minor   RaidDevice State
    this     2       8       19        2      spare   /dev/sdb3
    
       0     0       8        3        0      active sync   /dev/sda3
       1     1       0        0        1      faulty removed
       2     2       8       19        2      spare   /dev/sdb3
    ========================
    
    Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
    md2 : active raid1 sdb3[2] sda3[0]
          1458830400 blocks [2/1] [U_]
    
    md1 : active raid1 sda2[0] sdb2[1]
          2104448 blocks [2/2] [UU]
    
    md0 : active raid1 sda1[0] sdb1[1]
          4200896 blocks [2/2] [UU]
    
    unused devices: <none>
    Now, when I first started this, it began to rebuild the spare - fairly slowly, but ETA was about 15 hours which I could live with. however, after about 6% of the rebuild, the rebuild counter in /proc/mdstat vanished and mdadm -monitor logged the rebuild as complete, which clearly it wasn't.

    Any clues? Trying not to have this box down too long (as it is a live server) but here are things I thought of trying:

    1) fsck on /dev/md2 from rescue cd then re-attempt rebuild

    2) dd /dev/sda3 --> /dev/sdb3

    3) fail and remove sdb3; build a new array (with one prefailed) on sdb3, mount both it and md2 (via rescue cd) and cp -r md2->new volume

    I have taken offsite backups, but would rather not have to rebuild this machine from scratch to restore them... any suggestions or thoughts on my plans above?
    Last edited by McDaveX; 06-28-2011 at 03:04 PM. Reason: missed word "ubuntu". duh...

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,574
    Well, I would verify the file system with fsck first, and then decide what to do. It may be that the rebuild didn't need to resync all of the disc. In any case, this may be indicative of a drive failure which needs to be replaced. Also bear in mind that a lot of RAID sets are drives from the same manufacturer batch which tend to fail together. One bit of advice for build RAID devices is to make sure that the drives are from different manufacturing batches.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Jun 2011
    Posts
    2
    Well, I am pretty sure the drive is failing, but it is the drive marked as synched that is the "bad apple" here, not the spare - that tests out fine in smartmon. If I even START the short test in smartmon for the "live" disk, I need to do a hard power cycle before the server can see the drive again, so some sort of firmware crash.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •