Hi Everyone,

First, I hope this is the right forum to post in - it's an mdadm issue, but running under CentOS5 (RHEL sources) and I hope someone here can help as I'm having a serious issue with a RAID5 array.

Some background information:

It's CentOS5 with 5 external USB 320GB drives set up in a RAID5 configuration with no spare disks.

A list of hard drives from dmesg are below as well as the mdadm --examine command for each drive.

Everything was working fine until I started to have intermittent problems with /dev/sdj1. It turned out that the power brick for it was a little bit flakey and would occasionally turn off, causing the drive to appear faulty in mdadm and be removed from the array. I'd restart the drive, re-add it to the array and mdadm would rebuild the array on it and things would be good again for a while (as in days if not weeks). I've been planning to get a replacement drive for it but life tends to get in the way sometimes...

Anyway, last week I needed to shut down the server, so I did a graceful shutdown, brought the server back online and am now unable to start the array due to inconsistencies with what mdadm is reporting.

If I let mdadm start the array by itself, mdadm --detail /dev/md0 shows this:

mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Mon Apr 27 19:20:05 2009
Raid Level : raid5
Device Size : 312568576 (298.09 GiB 320.07 GB)
Raid Devices : 5
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Tue Jul 21 15:24:46 2009
State : active, degraded, Not Started
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
0 0 0 0 removed
1 0 0 1 removed
2 0 0 2 removed
3 0 0 3 removed
4 8 225 4 active sync



Which seems to be consistent with what mdadm --detail /dev/sdk1 reports.


If I stop the array, power off /dev/sdk1, then try to re-assemble the array, I get:

mdadm --assemble /dev/md0 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
mdadm: /dev/md0 assembled from 3 drives and 1 spare - not enough to start the array.


I've never had a spare in the array, yet one of the drives is now reporting as spare (drive sdg1, information below).

Is there any way I can get /dev/sdk1 to play nice? Or is there a way I can get /dev/sdg1 to stop thinking it's a spare?

Any help would be greatly appreciated.

Thanks in advance,

SJF








SCSI device sdg: 625138352 512-byte hdwr sectors (320071 MB)
sdg: Write Protect is off
sdg: Mode Sense: 03 00 00 00
sdg: assuming drive cache: write through
SCSI device sdg: 625138352 512-byte hdwr sectors (320071 MB)
sdg: Write Protect is off
sdg: Mode Sense: 03 00 00 00
sdg: assuming drive cache: write through
sdg: sdg1
sd 17:0:0:0: Attached scsi disk sdg
sd 17:0:0:0: Attached scsi generic sg6 type 0
SCSI device sdh: 625138352 512-byte hdwr sectors (320071 MB)
sdh: Write Protect is off
sdh: Mode Sense: 03 00 00 00
sdh: assuming drive cache: write through
SCSI device sdh: 625138352 512-byte hdwr sectors (320071 MB)
sdh: Write Protect is off
sdh: Mode Sense: 03 00 00 00
sdh: assuming drive cache: write through
sdh: sdh1
sd 18:0:0:0: Attached scsi disk sdh
sd 18:0:0:0: Attached scsi generic sg7 type 0
SCSI device sdi: 625138352 512-byte hdwr sectors (320071 MB)
sdi: Write Protect is off
sdi: Mode Sense: 03 00 00 00
sdi: assuming drive cache: write through
SCSI device sdi: 625138352 512-byte hdwr sectors (320071 MB)
sdi: Write Protect is off
sdi: Mode Sense: 03 00 00 00
sdi: assuming drive cache: write through
sdi: sdi1
sd 19:0:0:0: Attached scsi disk sdi
sd 19:0:0:0: Attached scsi generic sg8 type 0
SCSI device sdj: 625138352 512-byte hdwr sectors (320071 MB)
sdj: Write Protect is off
sdj: Mode Sense: 03 00 00 00
sdj: assuming drive cache: write through
SCSI device sdj: 625138352 512-byte hdwr sectors (320071 MB)
sdj: Write Protect is off
sdj: Mode Sense: 03 00 00 00
sdj: assuming drive cache: write through
sdj: sdj1
sd 20:0:0:0: Attached scsi disk sdj
sd 20:0:0:0: Attached scsi generic sg9 type 0
SCSI device sdk: 625138352 512-byte hdwr sectors (320071 MB)
sdk: Write Protect is off
sdk: Mode Sense: 03 00 00 00
sdk: assuming drive cache: write through
SCSI device sdk: 625138352 512-byte hdwr sectors (320071 MB)
sdk: Write Protect is off
sdk: Mode Sense: 03 00 00 00
sdk: assuming drive cache: write through
sdk: sdk1
sd 21:0:0:0: Attached scsi disk sdk
sd 21:0:0:0: Attached scsi generic sg10 type 0


mdadm --examine /dev/sdg1
/dev/sdg1:
Magic : a92b4efc
Version : 00.90.00
UUID : e1c3bbf8:8afda0ca:799bd861:c330f915
Creation Time : Mon Apr 27 19:20:05 2009
Raid Level : raid5
Device Size : 312568576 (298.09 GiB 320.07 GB)
Array Size : 1250274304 (1192.35 GiB 1280.28 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Mon Jul 13 17:02:30 2009
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : f9b3a762 - correct
Events : 0.181270

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 5 8 161 5 spare /dev/sdk1

0 0 0 0 0 removed
1 1 8 97 1 active sync /dev/sdg1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 129 3 active sync /dev/sdi1
4 4 8 145 4 active sync /dev/sdj1
5 5 8 161 5 spare /dev/sdk1


*** NOTE: I created this RAID array as RAID5 with 5 disks, no spares yet sdk1 shows as a spare drive according to sdg1 and sdg1 thinks it's sdk1. Also note that in the drive list, it shows a total of 6 drives in the array ( 0 to 5 ), yet there never was 6 drives and none of the other drives report any more than 5 drives in the array



mdadm --examine /dev/sdh1
/dev/sdh1:
Magic : a92b4efc
Version : 00.90.00
UUID : e1c3bbf8:8afda0ca:799bd861:c330f915
Creation Time : Mon Apr 27 19:20:05 2009
Raid Level : raid5
Device Size : 312568576 (298.09 GiB 320.07 GB)
Array Size : 1250274304 (1192.35 GiB 1280.28 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Tue Jul 21 15:22:33 2009
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : f9c03576 - correct
Events : 0.250104

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 97 1 active sync /dev/sdg1

0 0 0 0 0 removed
1 1 8 97 1 active sync /dev/sdg1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 129 3 active sync /dev/sdi1
4 4 8 145 4 active sync /dev/sdj1


*** NOTE: sdh1 shows no sdk1 and drive 0 as removed


mdadm --examine /dev/sdi1
/dev/sdi1:
Magic : a92b4efc
Version : 00.90.00
UUID : e1c3bbf8:8afda0ca:799bd861:c330f915
Creation Time : Mon Apr 27 19:20:05 2009
Raid Level : raid5
Device Size : 312568576 (298.09 GiB 320.07 GB)
Array Size : 1250274304 (1192.35 GiB 1280.28 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Tue Jul 21 15:22:33 2009
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : f9c03588 - correct
Events : 0.250104

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 2 8 113 2 active sync /dev/sdh1

0 0 0 0 0 removed
1 1 8 97 1 active sync /dev/sdg1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 129 3 active sync /dev/sdi1
4 4 8 145 4 active sync /dev/sdj1



*** NOTE: sdi1 shows no sdk1 and drive 0 removed


mdadm --examine /dev/sdj1
/dev/sdj1:
Magic : a92b4efc
Version : 00.90.00
UUID : e1c3bbf8:8afda0ca:799bd861:c330f915
Creation Time : Mon Apr 27 19:20:05 2009
Raid Level : raid5
Device Size : 312568576 (298.09 GiB 320.07 GB)
Array Size : 1250274304 (1192.35 GiB 1280.28 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Tue Jul 21 15:22:33 2009
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : f9c0359a - correct
Events : 0.250104

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 129 3 active sync /dev/sdi1

0 0 0 0 0 removed
1 1 8 97 1 active sync /dev/sdg1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 129 3 active sync /dev/sdi1
4 4 8 145 4 active sync /dev/sdj1


*** NOTE: sdi1 shows no sdk1 and drive 0 removed


mdadm --examine /dev/sdk1
/dev/sdk1:
Magic : a92b4efc
Version : 00.90.00
UUID : e1c3bbf8:8afda0ca:799bd861:c330f915
Creation Time : Mon Apr 27 19:20:05 2009
Raid Level : raid5
Device Size : 312568576 (298.09 GiB 320.07 GB)
Array Size : 1250274304 (1192.35 GiB 1280.28 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Tue Jul 21 15:24:46 2009
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 3
Spare Devices : 0
Checksum : f9c0366c - correct
Events : 0.250108

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 4 8 145 4 active sync /dev/sdj1

0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
4 4 8 145 4 active sync /dev/sdj1


*** NOTE: drive sdk1 thinks it's /dev/sdj1, shows no sdk1 and drive 0 removed