Results 1 to 4 of 4
Last night my email server was down and since I had it rebooted this morning I have been experiencing the following strange things:
1)
I'm using maildir format and all ...
- 08-26-2008 #1Linux User
- Join Date
- May 2008
- Location
- NYC, moved from KS & MO
- Posts
- 251
is my hd dying?
Last night my email server was down and since I had it rebooted this morning I have been experiencing the following strange things:
1)
I'm using maildir format and all except /boot partition, I'm using reiserfs. Under one of the mailbox folder when I run a ls -l I got this:
ls: cannot access new: Permission denied
total 17973
drwx------ 2 vmail vmail 3314072 2008-08-25 10:54 cur
-rw------- 1 vmail vmail 995748 2008-08-25 10:54 dovecot.index
-rw------- 1 vmail vmail 12780544 2008-06-25 21:24 dovecot.index.cache
-rw------- 1 vmail vmail 353920 2008-08-25 10:54 dovecot.index.log
-rw------- 1 vmail vmail 132220 2008-06-25 21:19 dovecot.index.log.2
-rw------- 1 vmail vmail 795886 2008-06-25 21:23 dovecot-uidlist
drwx------ 5 vmail vmail 232 2008-06-23 17:38 .INBOX.Drafts
drwx------ 5 vmail vmail 232 2008-06-23 17:38 .INBOX.Sent
drwx------ 5 vmail vmail 232 2008-06-16 10:12 .INBOX.Trash
?????????? ? ? ? ? ? new
-rw------- 1 vmail vmail 36 2008-06-23 17:37 subscriptions
drwx------ 2 vmail vmail 96 2008-08-26 07:53 tmp
and I can see these warnings right away in /var/log/messages
Aug 26 17:08:42 mailsrv kernel: ReiserFS: warning: is_leaf: free space seems wrong: level=1, nr_items=2, free_space=65528 rdkey
Aug 26 17:08:42 mailsrv kernel: ReiserFS: sda10: warning: vs-5150: search_by_key: invalid format found in block 35728. Fsck?
Aug 26 17:08:42 mailsrv kernel: ReiserFS: sda10: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2 5450 0x0 SD]
2)
My /tmp is also on a separate partition /dev/sda9
whenever I try to use command man to lookup some thing, for example, man ls, I got this
/usr/bin/nroff: Can't create temp directory, exiting...
Manual page ls(1) line 1/1 (END)
Also my squirrelmail stop working because php session files were denied to be created under /tmp as well. Non-root user is also denied write permission in that folder.
3)
I happened to run yast to lookup some hardware information this morning, when I exit, yast froze and I had to kill the ssh session by closing the terminal tab. After I re-logged in, I found that command y2base is using 100% CPU (drops to 99 occasionally). command
lsof /proc/19320/fd/0 shows
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
...
y2base 19320 root 0u CHR 136,1 3 /dev/pts/1
y2base 19320 root 3u CHR 136,1 3 /dev/pts/1
y2base 19320 root 4r CHR 136,1 3 /dev/pts/1
y2base 19320 root 5w CHR 136,1 3 /dev/pts/1
19320 is the PID of y2base
I'm pretty sure something's going wrong with the reiserfs partition. The hard disk might be failing too. I'm going to check the file system tonight after I unmount that partition in question.
The system is running OpenSuSE 10.2 with 3G RAM and 250G Seagate SATA HD.
Any suggestions would be greatly appreciated.
- 08-27-2008 #2
TO really check the hard drive you must do a low level scan. You can get scanning software normally from the manufacturer , but I perfer to use a commercial program called Spinrite. Googal for it. If you mantain hardware there is nothing better and it will breath life into a bad harddrive better then any thing I know. You normally woulf also need to rebuild the file system also but you must get rid of or repair bad sectors first or ist will jusr happen again.
good luck
- 08-27-2008 #3Linux User
- Join Date
- May 2008
- Location
- NYC, moved from KS & MO
- Posts
- 251
Thanks gogalthorp for the info. I ran a fs checkup last night and here's what I did
I stopped the email and dovecot service, unmount /dev/sda10 with option -l (regular umount command gives device busy message), then
fsck.reiserfs /dev/sda10
(found quite some file system errors, along with message saying that 4 corruption errors can only be fix by using --rebuild-tree option), so I ran again
fsck.reiserfs --rebuild-tree /dev/sda10
It took quite a while (around 40 min) for the whole process to finish.
Right after that I did a system reboot. After it came back I checked all the problems I found in my previous post and they were all gone. So far I haven't seen any warning or error messages about the file system in the system log file. But I still don't know what has caused the filesystem errors though.
- 08-28-2008 #4
Power loss is most likely. Today I had a short power loss just as I was starting XP in a VM it messed up that partition and also messed up some sectors on /home. Did a low level scan got a few sectors back then did the fsck dance. Lost some stuff in the VM so I just restored from back up and got my project back in sync from the SVN repository we maintain. But I lost my email history and settings and I had not backed up since May. Ouch there are bits and pieces of the email file in lost and found but I have not decided whether it is worth the effort to to piece it back together. I guess I need a Battery backup.


Reply With Quote