Results 1 to 7 of 7
Hey there anyone interested in RAID talk:
Based on the man pages:
* mdadm( : manage MD devices aka Software Raid - Linux man page
* mdadm.conf(5) - Linux man ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 11-03-2008 #1Just Joined!
- Join Date
- Jan 2005
- Posts
- 54
Configure MDADM to use DEVICES by more gracefully
Hey there anyone interested in RAID talk:
Based on the man pages:
* mdadm(
: manage MD devices aka Software Raid - Linux man page
* mdadm.conf(5) - Linux man page
I can't seem to figure out how to add devices to an array by something other than their raw device identifier (i.e. /dev/sda1).
With a system using multiple SATA/IDE controllers, it's very possible when adding a new drive that the IDs will be shifted around.
Simple Example:
We have three drives on IDE controllers:
PRIMARY MASTER - /dev/hda
PRIMARY SLAVE - CD-ROM
SECONDARY MASTER - /dev/hdb
SECONDARY SLAVE - /dev/hdc
If we add another IDE drive, (and kill the CD-ROM):
PRIMARY MASTER - /dev/hda
PRIMARY SLAVE - /dev/hdb
SECONDARY MASTER - /dev/hdc
SECONDARY SLAVE - /dev/hdd
Now as you can see, the IDs are shifted, and this would screw up any configuration in /etc/mdadm/mdadm.conf.
How can we manage disks in an array more gracefully to accommodate for this possible change?
I know there is something called "superminor" ... ? But every time I"ve tried to configure for this the array fails to start.
Here's my /etc/mdadm/mdadm.conf file:
Here's the details of the array:Code:# mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR address@mydomain.com # definitions of existing MD arrays # This file was auto-generated on Sun, 20 Jan 2008 14:31:03 -0500 # by mkconf $Id: mkconf 261 2006-11-09 13:32:35Z madduck $ ARRAY /dev/md500 devices=/dev/sda1,/dev/sdb1,/dev/hdc1,/dev/hdd1 uuid=fb89b2c2:7e682bd7:01f5a1db:50a22640
Here's an examine on one of the drives:Code:/dev/md500: Version : 00.90.03 Creation Time : Mon Nov 3 10:28:18 2008 Raid Level : raid5 Array Size : 1465007040 (1397.14 GiB 1500.17 GB) Device Size : 488335680 (465.71 GiB 500.06 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 500 Persistence : Superblock is persistent Update Time : Mon Nov 3 18:13:18 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : fb89b2c2:7e682bd7:01f5a1db:50a22640 (local to host myhostname) Events : 0.10 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 22 1 2 active sync /dev/hdc1 3 22 65 3 active sync /dev/hdd1
Code:/dev/sda1: Magic : a92b4efc Version : 00.90.00 UUID : fb89b2c2:7e682bd7:01f5a1db:50a22640 (local to host myhostname) Creation Time : Mon Nov 3 10:28:18 2008 Raid Level : raid5 Device Size : 488335680 (465.71 GiB 500.06 GB) Array Size : 1465007040 (1397.14 GiB 1500.17 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 500 Update Time : Mon Nov 3 18:16:17 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Checksum : 24f00103 - correct Events : 0.10 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 1 0 active sync /dev/sda1 0 0 8 1 0 active sync /dev/sda1 1 1 8 17 1 active sync /dev/sdb1 2 2 22 1 2 active sync /dev/hdc1 3 3 22 65 3 active sync /dev/hdd1
PS: Running Ubuntu Server 7.04.
- 11-04-2008 #2Just Joined!
- Join Date
- Nov 2008
- Posts
- 2
Hello friend!
You don't need to use mdadm.conf. mdadm will automatically rebuild your arrays on boot based on their UUID, and the UUID of the array they belong to. If you are trying to track devices based on their device node after shuffling things around or adding / removing drives, then I would suggest either using udev to assign static nodes to the drives, or using the hdparm utility to get the serial number of a drive which has failed so you know which one to unplug and replace.
If you insist on using a config file for mdadm, then you should use udev so that your devices will not be renamed regardless of what is added to or removed from the system.
- 11-04-2008 #3Just Joined!
- Join Date
- Jan 2005
- Posts
- 54
Hai there!
Thanks for the comments RaceTM. Very useful, now I know to not control this sucker ;-0 (counter-intuitive to Linux in general though eh? DON'T use the conf file? lol)
--- REMARK: This has apparently turned into a long reply/story, read/reply at your leisure please, no huge rush ;-0
----
I'm still partially confused though.
What do you think about this setup guide?
bfish.xaedalus.net Software RAID 5 in Ubuntu with mdadm
They say to actually use the config file.
However, if I have the line "DEVICE partitions" in there, the system takes FOREVER to load (from the looks of HDD activity on all controller cards, it looks like it's rebuilding one of the big 4x500GB arrays ... ) (as you'll notice later by reading further, if my BIG partitions have md superblocks defined, and "DEVICE partitions" is there, it takes forever ... if only the SMALL partitions have md superblocks defined, it takes 'longer' (about 10 mins) ... if "DEVICE partitions" isn't there, it boots fast. If there are no superblocks defined, the system boots very quickly.
Anyways.
Here's what I did since your comments.
1. Killed all arrays:
- My 'test array' 4x48MB - used because it's WAY faster when fiddling/testing/learning this stuff
* sudo mdadm --stop /dev/md501
* sudo mdadm --zero-superblock /dev/sdf2
* sudo mdadm --zero-superblock /dev/sdg2
* sudo mdadm --zero-superblock /dev/hdc2
* sudo mdadm --zero-superblock /dev/hdd2
- My 'real array' 4x500GB
* sudo mdadm --stop /dev/md500
* sudo mdadm --zero-superblock /dev/sdf1
* sudo mdadm --zero-superblock /dev/sdg1
* sudo mdadm --zero-superblock /dev/hdc1
* sudo mdadm --zero-superblock /dev/hdd1
2. Recreated the test array:
Code:sudo mdadm --create --verbose /dev/md501 --level=5 --raid-devices=4 /dev/hdc2 /dev/hdd2 /dev/sdf2 /dev/sdg2
* it worked dandy, 'yay'Code:cat /proc/mdstat Personalities : [raid0] [raid6] [raid5] [raid4] md501 : active raid5 hdc2[0] sdg2[3] sdf2[2] hdd2[1] 144384 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
3. Added this line
to /etc/mdadm.confARRAY /dev/md501 level=raid5 num-devices=4 UUID=8c963769:a5f92a9a:01f5a1db:50a22640
* NOTE: I verified using mdadm (--detail and --examine) that both ARRAY (/dev/md501) and devices all have the correct UUID.
* Added this line cause I figured it wouldn't hurt .. I'm not specifying devices ... so it should do what you say.
4. Restart mdadm-raid daemon to 'simulate' a reboot.
* The array was still there. good stuff.Code:sudo /etc/init.d/mdadm-raid restart * Stopping MD array md501... ...done. * Assembling MD array md501... ...done. * Generating udev events for MD arrays... ...done.
decided to test a reboot
5. Rebooted (sudo reboot) ooooo
the system came back really fast (good sign), BUT ... the array was busted.
It only saw hdc2 and hdd2. I checked fdisk ... for some reason the two SATA devices got shuffled from /dev/sdf and /dev/sdg to /dev/sda and /dev/sdb! wtf. (determined this by noticing the partition tables of course .. including the "Linux raid autodetect" fs type) - actually, they were ORIGINALLY sda and sdb, but at some point went to sdf and sdg ... it's beside me as to why this happens, but it's a good test at least for mdadm. ;-0Code:fermulator@myhostname:~$ cat /proc/mdstat Personalities : [raid0] md501 : inactive hdc2[0](S) hdd2[1](S) 96256 blocks
At this point I'm confused. Because i KNOW I can simply "re-add" the shuffled sda and sdb devices ... but I thought you said mdadm was smart and would find it's own devices no matter where they went to?
SO.... i thought maybe, because I was missing hte "DEVICE partitions" line ... it wasn't autosearching.
added
to /etc/mdadm.confDEVICE partitions
Rebooted. The thing took about 10 minutes to boot this time (I'm not physically there so I dunno what it was doing ... I know it was stuck in single user mode for the whole 10 minutes though because SSH connections were refused.).
K then I decided to fully trust mdadm ... and completely clear my mdadm.conf file. (no DEVICE or ARRAY definitions))
rebooted.
again, took a while to reboot. again, about 10 minutes.
It came back though (thankfully), and now it's still a 'busted' array
again, i can add/re-add the 'missing' devices ... but I just don't understand why it didn't find them itself. Every time I reboot I don't want to have to make sure the array's came back and manually fix them, you know? And I also don't understand how you're saying that even if devices shuffle, mdadm will find them. Because in all the raid device superblocks, they seem to have defined 'each other'.
notice:
Of course, sdf and sdg don't exist anymore as raid devices! So ... how is MDADM to know that they've moved to sda and sdb?Code:fermulator@myhostname:~$ sudo mdadm --examine /dev/hdc2 /dev/hdc2: Magic : a92b4efc Version : 00.90.00 UUID : 8c963769:a5f92a9a:01f5a1db:50a22640 (local to host myhostname) Creation Time : Tue Nov 4 09:40:44 2008 Raid Level : raid5 Device Size : 48128 (47.01 MiB 49.28 MB) Array Size : 144384 (141.02 MiB 147.85 MB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 501 Update Time : Tue Nov 4 09:40:48 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Checksum : c074f6dd - correct Events : 0.4 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 22 2 0 active sync /dev/hdc2 0 0 22 2 0 active sync /dev/hdc2 1 1 22 66 1 active sync /dev/hdd2 2 2 8 82 2 active sync /dev/sdf2 3 3 8 98 3 active sync /dev/sdg2
Turns out I can't even add the sda and sdb devices like I thought I should be able to do ...
what the crunch? You're bottomless pit of raid experience is lovely!Code:sudo mdadm --detail /dev/md501 mdadm: md device /dev/md501 does not appear to be active.
INFO: - sample of partitions table (all four devices we're talking about here are the same), applying to sda/f, sdb/g, hdc, and hdd
Code:Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 60795 488335806 fd Linux raid autodetect /dev/sda2 60796 60801 48195 fd Linux raid autodetect
- 11-04-2008 #4Just Joined!
- Join Date
- Jan 2005
- Posts
- 54
OK this is also interesting, which may negate my previous post really.
After rebooting a few times, /dev/sda,b have gone back to /dev/sdf,g.
Now, I went to try to force a reassemble, and have shed some light on to why the array is broken:
OH REALLY:Code:sudo mdadm --assemble /dev/md501 --uuid=8c963769:a5f92a9a:01f5a1db:50a22640 /dev/hdc2 /dev/hdd2 /dev/sdf2 /dev/sdg2 mdadm: /dev/sdf2 has wrong uuid. mdadm: /dev/sdg2 has wrong uuid. mdadm: /dev/md501 assembled from 2 drives - not enough to start the array.
Code:sudo mdadm --examine /dev/sdg2 /dev/sdg2: Magic : a92b4efc Version : 00.90.00 UUID : 8c963769:a5f92a9a:1550c48f:5b470e25 Creation Time : Tue Nov 4 09:40:44 2008 Raid Level : raid5 Device Size : 48128 (47.01 MiB 49.28 MB) Array Size : 144384 (141.02 MiB 147.85 MB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 501 Update Time : Tue Nov 4 09:40:48 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Checksum : de7501ce - correct Events : 0.4 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 98 3 active sync /dev/sdg2 0 0 22 2 0 active sync /dev/hdc2 1 1 22 66 1 active sync /dev/hdd2 2 2 8 82 2 active sync /dev/sdf2 3 3 8 98 3 active sync /dev/sdg2
How the heck would the UUID in the superblock change???
- 11-04-2008 #5Linux Guru
- Join Date
- Nov 2007
- Posts
- 1,722
If you code md to use a /dev/sdX device, you're going to have problems if the HDD changes /dev identifier.
As has been stated:
1. If you don't want it to move, use udev rules.
2. Create your mdadm with UUID's
Example mdadm.conf:
** Edit: A better question is WHY are your dev paths changing? I think that answers why your UUID's are getting messed up.Code:cat /etc/mdadm.conf DEVICE partitions ARRAY /dev/md0 level=raid0 UUID=27f05570:cb2e17e0:a9f64f17:72d50f93 ARRAY /dev/md1 level=raid0 UUID=71016344:e9aaf5e1:e5cbeab4:3c4dba52 ARRAY /dev/md2 level=raid0 UUID=c3fb3d71:8c295705:0dec84e1:6ee720dc ARRAY /dev/md3 level=raid1 UUID=139e6465:a37e75fe:a7a0e1bc:a66f52c5
- 11-04-2008 #6Just Joined!
- Join Date
- Nov 2008
- Posts
- 2
Hi!
That doesn't make sense. I see four possible scenarios.
1) Your mdadm.conf wasn't actually blank, like you thought it was. mdadm tried to re-assemble an old array, and fubared your existing ones in the process.
2) mdadm is configured to use a config file other than /etc/mdadm.conf. Thus, it is still finding information about your old arrays somewhere.
3) There was an extra step you performed (perhaps by mistake), which screwed up or changed the array's UUID, or changed the array UUID of one of your component partitions.
4) The drives have been added / removed to so many arrays that their superblocks are fubared (despite zero-ing them..is this possible, I have no idea but lets say it is).
My recommendation would be the following:
- double check your mdadm configuration and be sure you know where it is looking for array information.
- Make sure none of your arrays are mounted, and stop them all.
- Delete all of your raid partitions (these are all empty since you're still playing with the setup, right?). If possible, delete all partitions on all discs which you want to belong to an array. I would also suggest zeroing out the partition table (zero the first 512 bytes of the drives)
- Delete your mdadm device nodes (/dev/md500, etc)
- Get rid of your mdadm.conf file altogether (sudo mv /etc/mdadm.conf /etc/mdadm.conf.back)
- Re-create your partitions the way you want them.
- Create your device nodes (if necessary, unless your version of mdadm does this for you), then create your arrays.
- cat /proc/mdstat until they are sync'd and ready to go.
- create a filesystem on the new array
- fsck the filesystem
- reboot
see what happens. If this still causes problems, then post back and let us know whats happening, and also google the link to the mdadm distribution list. fire them off a detailed email and see if you get some replies.
btw, you mentioned that you are locked out of your system for 10 mins when the arrays fail? this is weird as well..check your logs and see what's up with that. when you go home, sit in front of the pc and see what is happening. mdadm is supposed to fork in to the background. Unless your root filesystem is on a raid 5 array, I see no reason your entire boot process should be halted until mdadm sorts out its crap. This could be the sign of another issue, not related to mdadm. For example, if one of your drives is near failure - perhaps it is taking a long time to get detected by the system. Perhaps mdadm is finding the other drives right away, assembling your array in a degraded state, then the other drive(s) are finally detected but the array has already been built. This would not explain the uuid changing on you and I have no idea why that is happening.
Good luck!
- 11-06-2008 #7Just Joined!
- Join Date
- Jan 2005
- Posts
- 54
[solved]
Thanks RaceTM and HROAdmin26 for both of your replies.
So I'll post everything I did (i.e. another story) below, but if anyone in interested in the ROOT cause of the problem (without reading a huge chunk of text), please see: https://bugs.launchpad.net/ubuntu/+s...dm/+bug/188392
----
Now, the steps I performed to get where I got. Basically following RaceTM's instructions to get a 'clean' RAID state.
First, I confirmed mdadm configuration behaviour in ubuntu:
http://manpages.ubuntu.com/manpages/...adm.conf.html:
The specificism of Ubuntu (from what I could tell) was:
Next I killed the mdadm.conf configuration file:Upstream’s configuration file is /etc/mdadm.conf by default. On Debian systems, this file is only read if /etc/mdadm/mdadm.conf does not exist.
Then I unmounted all arrays:Code:cd /etc/mdadm sudo mv mdadm.conf mdadm.conf.bak
Then I stopped all arrays:Code:sudo umount /dev/md500 umount: /dev/md500: not found sudo umount /dev/md501 umount: /dev/md501: not mounted
Next we want to erase all partitions tables from all disks meant to be part of the array(s). Luckily for me I was still in trial phase and not losing any data.Code:sudo mdadm --stop /dev/md500 mdadm: error opening /dev/md500: No such file or directory sudo mdadm --stop /dev/md501 mdadm: stopped /dev/md501
Example:
- Used the commands and erased all partitions and wrote the new partition table:Code:sudo fdisk /dev/sdg
Now to be sure, I zeroed out the partition table (which is stored in the first 512bytes of the drive)The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
Code:sudo dd if=/dev/zero of=/dev/sdg bs=512 count=1
This was repeated for all drives in question. (For my case, /dev/hdc, /dev/hdd, /dev/sda, /dev/sdb).1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000523514 s, 978 kB/s
Now, kill all md node devices that might be leftover. To find these devices, first I ran:
This showed me which node devices still were around, so then I removed them:Code:cd /etc ls -ral | grep md
Now it was time to restart the mdadm daemon:Code:sudo rm /etc/md0 sudo rm /etc/md501 sudo rm /etc/md/501
At this point it complained (FAILED to start) ... so I then figured out that this particular 'flavour' REQUIRES the config file to at least exist. Created a SKELETON config file:Code:sudo /etc/init.d/mdadm-raid restart
I was then able to start mdadm daemon without a problem!# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#
# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes
# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR mailaddress@domain.com
# definitions of existing MD arrays
# This file was auto-generated on Sun, 20 Jan 2008 14:31:03 -0500
# by mkconf $Id: mkconf 261 2006-11-09 13:32:35Z madduck $
Code:sudo /etc/init.d/mdadm-raid start
Now it was time to configure my partitions! This was performed on all drives, here's an example:[OK]
Code:sudo fdisk -l /dev/hdd
Got a warning though on one of the drives:Disk /dev/hdd: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hdd1 1 60795 488335806 fd Linux raid autodetect
/dev/hdd2 60796 60801 48195 fd Linux raid autodetect
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
OK fine (since I didn't want to fiddle with this stuff, I rebooted the system).
WHOA: Here's where I had some massive light shed upon my situation. As mentioned at the top of the post, here's two links explaining a major issue with this configuration:
So at this point I still hadn't figured out the solution to the previous issue, and wasn't even sure if it was the root cause. Decided to fiddle some more...
Killed all raid partitions (again), above problem goes away. If there were NO partitions on the drives, the above problem didn't occur. Restarted the system a few times just to be sure things were 'hunky dory'.
Next I created only basic partitions (left default type as 'linux'), and rebooted.
WOW - I thought. As SOON as there were RAID partitions, the system experienced those error messages.
With the help of RaceTM over some telephoen conversations and 'paired sshing', he found:
After some discussion, we decided to try this and commented out line:/etc/udev/rules.d/85-mdadm.rulesHere goes the process over and over again!SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm /sbin/mdadm -As"
Killed all partitions and zeroed out superblocks, and first 512bytes (again) on all drives in question. Restarted the system MANY times and had problems!
Reconfigured the partitions to "Linux raid autodetect" again.
IT WORKS!
Again, tested with several more restarts ... still no problems restarting.
Now it was time to actually create the array:
Code:sudo mdadm --create --verbose /dev/md501 --level=5 --raid-devices=4 /dev/sda2 /dev/sdb2 /dev/hdc2 /dev/hdd2
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: /dev/sda2 appears to contain an ext2fs file system
size=144384K mtime=Mon Nov 3 04:02:30 2008
mdadm: /dev/hdc2 appears to contain an ext2fs file system
size=144576K mtime=Mon Nov 3 04:02:30 2008
mdadm: size set to 48128K
Continue creating array?Code:yes
Let's make a filesystem:mdadm: array /dev/md501 started.
Code:sudo mkfs.ext3 /dev/md501
Let's check it for errors just for fun:mke2fs 1.40-WIP (14-Nov-2006)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
36144 inodes, 144384 blocks
7219 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
18 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 27 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
Code:sudo fsck /dev/md501
Show me the money!fsck 1.40-WIP (14-Nov-2006)
e2fsck 1.40-WIP (14-Nov-2006)
/dev/md501: clean, 11/36144 files, 10231/144384 blocks
Code:sudo mdadm --detail /dev/md501
Restarted the system again many times, and experienced no issues! Looks like my problem was fixed with https://bugs.launchpad.net/ubuntu/+s...dm/+bug/188392./dev/md501:
Version : 00.90.03
Creation Time : Tue Nov 4 21:14:08 2008
Raid Level : raid5
Array Size : 144384 (141.02 MiB 147.85 MB)
Device Size : 48128 (47.01 MiB 49.28 MB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 501
Persistence : Superblock is persistent
Update Time : Tue Nov 4 21:16:21 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : f13ab85e:af6c2bc3:01f5a1db:50a22640 (local to host fermmy-server)
Events : 0.4
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 22 2 2 active sync /dev/hdc2
3 22 66 3 active sync /dev/hdd2
Thanks all for your help!


Reply With Quote

