Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 11
Hello all, don't know where exactly a filesystem issue should be posted in the fourm, but if it counts I'm using CentOS 5.3. My fileserver is not working as it ...
  1. #1
    Linux Newbie Mad Professor's Avatar
    Join Date
    May 2006
    Posts
    128

    Ext3, Data corrupting?

    Hello all, don't know where exactly a filesystem issue should be posted in the fourm, but if it counts I'm using CentOS 5.3.

    My fileserver is not working as it should. It has two PCI RocketRaid 1740 4x SATAII ports

    Raid 5 is on the first raid controller hooked up to four 500GB seagate drives and is working fine and normally.

    Raid 1 is on the second raid controller hooked up to Two 1.5TB Seagate drives model# ST31500341AS, Both are plagued drives from seagate with firmware CC1h.
    This particular array is not working as intended.

    Half the stuff I copy to this array becomes corrupt, this also shows the same during playback from my mythbox, which records to this array. Skipping through scenes, shows closed captions with rapid haste and audio skips consistent with video skips. If I take a game or software .iso file and copy/paste via windows/samba to a share mounted on the Raid 1, and then try to copy the same iso/file from the raid 1 to the other computer, and tried to use it. I get CRC errors during installation or file is corrupted or no data. Or if I run it directly from the fileserver across the network same thing, also does the same thing from the original computer the files came from.

    But I know the original files are not corrupt and have tested it via a hard copy on both machines and also verify it was not a network issue because I copy the files to the Raid 5 array across the network and it works fine.

    I went to the web gui for my raid controllers and went to maintenance and click verify raid 1 and 10 minutes later came back with "data not consistent" didn't even make it past 3%. It takes about 3 days to rebuild an array. I tried again, deleting corrupted files/directory via root on fileserver, C/P files across the network, tried to use files, got errors.

    I'm currently verifying the array again.

    Edit: Data not consistent at 20%

    I got to determine if the drives or controller needs to be replaced.

    I'm wondering what are the chances that the filesystem is damaged or broken?
    Both arrays use Ext3.

    I want to say it's a filesystem issue on raid 1 array, because the raid controller reports S.M.A.R.T. from both drives and they are healthy and I see no abnormalities in attribute data from both drives.

    It could be the raid controller, I could test it on the first controller but I really do not want to disturb a working array.

    Finally the seagate drives, this isn't the first time I had a problem with these drives, I setup a software raid 1 with mdadm, had email notification with a basic SATA controller long time ago. One of the drives dropped out of the array and never rebuild, even tho mdadm was aware of the drive and it was working fine, never reported the problem to me, when the drive that it was suppose to be mirroring started freezing randomly, it didn't notify me of that either.

    I didn't have backups at the time due to drive failure on the external, end up having total data lost. I tested mdadm after this disaster, I got the test email fine, and it was in the config. After that I decided to invest in a cheap fake hardware raid controller. It's not true one until it gets an XOR chip.

    I'm not sure what the problem is, cheaper then RMA/replacing hardware.

    Should I backup what data I have and break the raid 1 array and rebuild it and Format again with Ext3?

    or is this a Drives/controller issue?

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    You need to check for bad blocks on both discs of the array. See the man page for the badblocks command. Or alternatively, the -c option for fsck which will also check for bad blocks on most file systems.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Linux Newbie Mad Professor's Avatar
    Join Date
    May 2006
    Posts
    128
    Thanks I was able to rule out a corrupt filesystem. I had to break the array. It points to a bad hard-drive that is either beginning to lose the ability to read or write, not sure. Gotta do further testing with the troubled drive.

    Thank you.

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    You should be able to swap out the bad drive and the array should remirror it when the new one is powered up. If it is a Seagate drive under warranty, you can get a replacement overnited to you (assuming you are in the US) with pre-paid return shipping label/package for $20-30 USD. Their web site has all the forms and stuff you need to fill out online. They can also determine the warranty status of the drive from its serial number.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Linux Newbie Mad Professor's Avatar
    Join Date
    May 2006
    Posts
    128
    what I though was a bad drive is actually becoming more of a raid controller issue after swapping the drives around, the bad drive became a good drive and the good drive went bad. Seems to me that port 1 on the controller is faulty. Even tho functional, just not working correctly, corrupting data as it's being written and read from the disk.

    Can't be the cables, I've replaced them hoping they would fix the problem.


    not sure if this is possible.

  6. #6
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    Sounds like it's fubar to me. At least you were just mirroring the drives, so you should be able to use one of the drives in another controller. I would suggest getting an eSata controller and a couple of external enclosures. Then, you can install the drives singly and compare them - run checksums on the drives themselves (cksum or md5sum /dev/sdX where 'sdX' is the device id). If the checksums match, you are done except to validate the data on the drives. If they don't match, then you will need to run the checksum against all the files on the drives after you mount them. That will show you which files are questionable (or at least don't match). I've had good results with an Addonics eSata RAID controller (set to JBOD) - 4 ports and costs about $75USD delivered from Buy.com: Addonics 4-Port External Serial ATA II RAID Controller - ADS3GX4R5-E - Buy.com
    And here's a nice 2-drive eSata/USB dock that costs about $55USD delivered: Startech.com eSATA/USB to SATA External HDD Dock Adapter - SATADOCK22UE - Buy.com
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  7. #7
    Linux Newbie Mad Professor's Avatar
    Join Date
    May 2006
    Posts
    128
    so after some serious testing, the raid controller works fine in another computer, but the minute you put it back in the fileserver it starts misbehaving, and only at port one, I've tried a different slot hoping it was a bad pci slot, but no go.

    That leaves only two things, Motherboard or power supply. Either one is expensive to replace. Considering the motherboard is socket A and the power supply is 3 year old seasonic 350Watt, but the caps might have started aging. But the thing is the other raid controller #1 is working fine, so why is this controller #2 misbehaving? Even tho it works perfectly fine in a test system.

  8. #8
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    Quote Originally Posted by Mad Professor View Post
    so after some serious testing, the raid controller works fine in another computer, but the minute you put it back in the fileserver it starts misbehaving, and only at port one, I've tried a different slot hoping it was a bad pci slot, but no go.

    That leaves only two things, Motherboard or power supply. Either one is expensive to replace. Considering the motherboard is socket A and the power supply is 3 year old seasonic 350Watt, but the caps might have started aging. But the thing is the other raid controller #1 is working fine, so why is this controller #2 misbehaving? Even tho it works perfectly fine in a test system.
    Power supply is the obvious culprit here. My guess is that the "flaky" controller is a bit out-of-spec at some voltage range and it is causing the power supply to be overloaded in that range. A 350VA supply is pretty small for a server. I won't deploy a server with less than 750VA supply myself, and I pay especial attention to the voltages my peripherals, such as disc controllers, need. When the supply cannot keep up with demand, then voltage drops and current spikes - you are lucky you haven't toasted the controller and motherboard as well.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  9. #9
    Linux Newbie Mad Professor's Avatar
    Join Date
    May 2006
    Posts
    128
    so whats a bad voltage? +/-10% of nominal output?

    Another thing is that the motherboard could be the culprit, the onboard nic was fried by a lighting strike that occurred near by, it took out the modem, router, skipped the switch and fried nic.

    This is of course before the time I bought the second "Problematic" raid controller.

  10. #10
    Linux Newbie Mad Professor's Avatar
    Join Date
    May 2006
    Posts
    128
    Bought a Corsair 750Watt PSU, but the problem still remains, I also thought at one point to be a driver issue ,so I updated them from source 2.1 to 2.4, now it showing bad sectors on both 1.5TB seagate drives. Confirmed on a different raid controller.

    I don't know, Maybe those last two slots on the mobo are fubar or something.

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...