Find the answer to your Linux question:
Results 1 to 4 of 4
Last night my email server was down and since I had it rebooted this morning I have been experiencing the following strange things: 1) I'm using maildir format and all ...
  1. #1
    Linux User
    Join Date
    May 2008
    Location
    NYC, moved from KS & MO
    Posts
    251

    Lightbulb is my hd dying?

    Last night my email server was down and since I had it rebooted this morning I have been experiencing the following strange things:
    1)
    I'm using maildir format and all except /boot partition, I'm using reiserfs. Under one of the mailbox folder when I run a ls -l I got this:
    ls: cannot access new: Permission denied
    total 17973
    drwx------ 2 vmail vmail 3314072 2008-08-25 10:54 cur
    -rw------- 1 vmail vmail 995748 2008-08-25 10:54 dovecot.index
    -rw------- 1 vmail vmail 12780544 2008-06-25 21:24 dovecot.index.cache
    -rw------- 1 vmail vmail 353920 2008-08-25 10:54 dovecot.index.log
    -rw------- 1 vmail vmail 132220 2008-06-25 21:19 dovecot.index.log.2
    -rw------- 1 vmail vmail 795886 2008-06-25 21:23 dovecot-uidlist
    drwx------ 5 vmail vmail 232 2008-06-23 17:38 .INBOX.Drafts
    drwx------ 5 vmail vmail 232 2008-06-23 17:38 .INBOX.Sent
    drwx------ 5 vmail vmail 232 2008-06-16 10:12 .INBOX.Trash
    ?????????? ? ? ? ? ? new
    -rw------- 1 vmail vmail 36 2008-06-23 17:37 subscriptions
    drwx------ 2 vmail vmail 96 2008-08-26 07:53 tmp


    and I can see these warnings right away in /var/log/messages
    Aug 26 17:08:42 mailsrv kernel: ReiserFS: warning: is_leaf: free space seems wrong: level=1, nr_items=2, free_space=65528 rdkey
    Aug 26 17:08:42 mailsrv kernel: ReiserFS: sda10: warning: vs-5150: search_by_key: invalid format found in block 35728. Fsck?
    Aug 26 17:08:42 mailsrv kernel: ReiserFS: sda10: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2 5450 0x0 SD]


    2)
    My /tmp is also on a separate partition /dev/sda9
    whenever I try to use command man to lookup some thing, for example, man ls, I got this
    /usr/bin/nroff: Can't create temp directory, exiting...
    Manual page ls(1) line 1/1 (END)

    Also my squirrelmail stop working because php session files were denied to be created under /tmp as well. Non-root user is also denied write permission in that folder.

    3)
    I happened to run yast to lookup some hardware information this morning, when I exit, yast froze and I had to kill the ssh session by closing the terminal tab. After I re-logged in, I found that command y2base is using 100% CPU (drops to 99 occasionally). command
    lsof /proc/19320/fd/0 shows
    COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
    ...
    y2base 19320 root 0u CHR 136,1 3 /dev/pts/1
    y2base 19320 root 3u CHR 136,1 3 /dev/pts/1
    y2base 19320 root 4r CHR 136,1 3 /dev/pts/1
    y2base 19320 root 5w CHR 136,1 3 /dev/pts/1


    19320 is the PID of y2base

    I'm pretty sure something's going wrong with the reiserfs partition. The hard disk might be failing too. I'm going to check the file system tonight after I unmount that partition in question.

    The system is running OpenSuSE 10.2 with 3G RAM and 250G Seagate SATA HD.

    Any suggestions would be greatly appreciated.

  2. #2
    Linux Guru gogalthorp's Avatar
    Join Date
    Oct 2006
    Location
    West (by God) Virginia
    Posts
    3,105
    TO really check the hard drive you must do a low level scan. You can get scanning software normally from the manufacturer , but I perfer to use a commercial program called Spinrite. Googal for it. If you mantain hardware there is nothing better and it will breath life into a bad harddrive better then any thing I know. You normally woulf also need to rebuild the file system also but you must get rid of or repair bad sectors first or ist will jusr happen again.

    good luck

  3. #3
    Linux User
    Join Date
    May 2008
    Location
    NYC, moved from KS & MO
    Posts
    251
    Thanks gogalthorp for the info. I ran a fs checkup last night and here's what I did
    I stopped the email and dovecot service, unmount /dev/sda10 with option -l (regular umount command gives device busy message), then
    fsck.reiserfs /dev/sda10
    (found quite some file system errors, along with message saying that 4 corruption errors can only be fix by using --rebuild-tree option), so I ran again
    fsck.reiserfs --rebuild-tree /dev/sda10
    It took quite a while (around 40 min) for the whole process to finish.
    Right after that I did a system reboot. After it came back I checked all the problems I found in my previous post and they were all gone. So far I haven't seen any warning or error messages about the file system in the system log file. But I still don't know what has caused the filesystem errors though.

  4. #4
    Linux Guru gogalthorp's Avatar
    Join Date
    Oct 2006
    Location
    West (by God) Virginia
    Posts
    3,105
    Power loss is most likely. Today I had a short power loss just as I was starting XP in a VM it messed up that partition and also messed up some sectors on /home. Did a low level scan got a few sectors back then did the fsck dance. Lost some stuff in the VM so I just restored from back up and got my project back in sync from the SVN repository we maintain. But I lost my email history and settings and I had not backed up since May. Ouch there are bits and pieces of the email file in lost and found but I have not decided whether it is worth the effort to to piece it back together. I guess I need a Battery backup.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...