Results 1 to 4 of 4
I'm a bit of a Linux newbie, but I did manage to set up the following RAID-5 system:
1x 500GB system drive on ATA IDE
4x 1TB SATA drives in ...
- 04-28-2011 #1Just Joined!
- Join Date
- Apr 2011
- Posts
- 3
Unable to replace drive in RAID 5 after Filesystem check freezes
I'm a bit of a Linux newbie, but I did manage to set up the following RAID-5 system:
1x 500GB system drive on ATA IDE
4x 1TB SATA drives in software RAID
Linux = Fedora 13
So here's what happened. I set up the system to send me an email every time the mdadm stat file changed, so it would send me emails when in periodically ran a self-test. I was away and noticed that the self-test was going incredibly slow (usually took 8 hours...was on course for taking 250 days!) A colleague decided to just reboot the system.
Afterwards, the system would not boot and, while all 5 drives were connected, would stop at an endlessly scrolling error message of:
I worked out that it was a single drive that was causing said error. When just the system drive and the other 3 RAID drives were connected, it would get past that error, yet stop at a filesystem check error and want to drop me to a recovery terminal and wouldn't go any further. When trying to run a fsck scan, it kept saying bad superblock on the failed drive, but none of the suggestions it gave would work.Code:ata4.01: exception Emask 0x0 SErr 0x0 action 0x0 ata4.01: BMDMA stay 0x64 ata4.01: failed command: READ DMA ata4.01: (a bunch of hex numbers) ata4.01: (a bunch of hex numbers, again) ata4.01: status {DRDY ERR} ata4.01: error: {UNC}
When attempting to boot to a Live-CD version of Fedora (and Ubuntu) with all 4 RAID drives attached, the same error occurs.
With only the other 3 drives attached, it boots into Live-CD Linux just fine.
In Palimpsest, it shows the 3 drives as healthy and as parts of a RAID array.
However, when I try to start the array through Palimpsest, it says that there are not enough disks to start the array....even though there are 3, which is what RAID 5 was supposed to be about. (The drives contain backups of important research data)
I have sent the defective drive back to WD and received the new one today.
I have put the new drive in and it boots up (with the /dev/md127 commented out in fstab)
When I start up the machine with the new drive installed (and /dev/md127 NOT commented out in fstab) The system still boots to an error:
And then the filesystem is Read-only.Code:Checking Filesystems... /dev/md127: The Superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem, then the superblock is corrupt, and you might try running e2fsck with an alternate superblock *** An error occurred during the file system check. ***Dropping you to a shell: the system will reboot ***When you leave the shell.
Thing is, the RAID filesystem is ext4, so I'm not sure why it is saying ext2.
Do I need to stop this filesystem check from happening automatically in order to start the array degraded and add the new drive??
Thanks,
Ta-materLast edited by Ta-mater; 04-28-2011 at 03:06 PM.
- 04-30-2011 #2Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
Not much help here, except that ext3/4 file systems are ext2 with journaling added on. When your "colleague" rebooted the system in the middle of the system scan, I think things got munged. Rebooting was not what should have been done. There are other means to shut down or stop the scan that would have left the file system intact. Doh! Not sure, but you may be fubar. Recover what you can, and restore the rest from backups (you have some, right?).
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 05-03-2011 #3Just Joined!
- Join Date
- Apr 2011
- Posts
- 3
Well, that makes sense....nothing I've tried has done any good....
I guess I'll just have to wipe it and make a new array.
Just for peace of mind, if that filesystem check getting stuck happens again, what are these steps to stop it of which you speak? What should I have done?
- 05-03-2011 #4Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
From another terminal window, as root, you can run the 'kill <pid>' where <pid> is the process id of the fsck scanning process. That will raise a signal on fsck which will catch that and then shut down when it is safe to do so, without munging the file system.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!


Reply With Quote