Results 1 to 7 of 7
Hello all!
Some really weird thigs are happening to my server.
It works fine for several days, and after it it goes down, but in a really strange way - ...
- 10-26-2009 #1Just Joined!
- Join Date
- Oct 2009
- Posts
- 3
Unable to access file system fedora 6
Hello all!
Some really weird thigs are happening to my server.
It works fine for several days, and after it it goes down, but in a really strange way - looks like a file system collapses or something.
The symptoms are the following:
1) reboot returns an error:
shutdown: no such fil or directory
2) tomcat cannot load servlet classes
3) wget returns input/output error
4) cannot download file via SFTP, cannot upload file via SFTP, cannot create a directory
5) Despite the fact that almost all file operations are inoperative, "ls" works.
Reboot using sys rq works fine
echo 1 > /proc/sys/kernel/sysrq
echo b > /proc/sysrq-trigger
After this reboot the whole system and all applications (Tomcat, Postgres) are working fine, until next crash.
I have no idea what's happening and what should I do to get an understanding of this situation.
Can someone help me with this issue or at least give some advices on how to make a proper diagnostics of Fedora Core 6, because I have no idea what sould I start with.
Thank you in advance!
- 10-26-2009 #2
i don't have any idea about your problem.
just one thing
FC.6 is very old now.
you should upgrade to FC.11 or something else.
even FC.12 is scheduled to be released in mid of November.Sorry, it was unintentional.
You should have told me at least once and i could have fix it.
thanks for reminding me.
- 10-26-2009 #3
or if you want something red hat based but is stable and supported for a long time, you should use CentOS.
My guess is you are running out of disk space, or have not enough memory/swap.
- 10-27-2009 #4Just Joined!
- Join Date
- Oct 2009
- Posts
- 3
Unfortunately I can't reinstall or upgrade the operationg system on this server, I'm renting it and I all I have is an SSH connection to it.
Some information was in server's messages file, it corresponds to the time when this error occured
What could this mean?Oct 26 15:00:43 rbi0104 kernel: audit(1256565643.414:3): avc: denied { execmod } for pid=1989 comm="jsvc" name="libjvm.so" dev=sda3 ino=2588918 scontext=system_u:system_r:initrc_t:s0 tcontext=root:object_r:usr_t:s0 tclass=file
Oct 26 15:00:51 rbi0104 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 26 15:00:51 rbi0104 kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
Oct 26 15:00:51 rbi0104 kernel: ata1: EH complete
Oct 26 15:00:52 rbi0104 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 26 15:00:52 rbi0104 kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
Oct 26 15:00:52 rbi0104 kernel: ata1: EH complete
Oct 26 15:00:52 rbi0104 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 26 15:00:52 rbi0104 kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
Oct 26 15:00:52 rbi0104 kernel: ata1: EH complete
Oct 26 15:00:52 rbi0104 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 26 15:00:52 rbi0104 kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
Oct 26 15:00:52 rbi0104 kernel: ata1: EH complete
Oct 26 15:00:52 rbi0104 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 26 15:00:52 rbi0104 kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
Oct 26 15:00:52 rbi0104 kernel: ata1: EH complete
Oct 26 15:00:52 rbi0104 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 26 15:00:52 rbi0104 kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
Oct 26 15:00:52 rbi0104 kernel: ata1: EH complete
Oct 26 15:00:52 rbi0104 kernel: SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
Oct 26 15:00:52 rbi0104 kernel: sda: Write Protect is off
Oct 26 15:00:52 rbi0104 kernel: SCSI device sda: drive cache: write back
Oct 26 15:00:52 rbi0104 kernel: SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
Oct 26 15:00:52 rbi0104 kernel: sda: Write Protect is off
kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
Some kind of hardware error?
In case if someone would be so kind to help me and would like to look through the whole messages file, I published it here:
http: / / heroes . kz/manager/messages.zip
And the quoted messages are from file messages.1
I'm trying to google it by myself right now, but I would be very appreciated if someone will give me some ideas.
Thank you in advance!
- 10-27-2009 #5
I googled out this one Linux-Kernel Archive: FYI: strange libata EH lines in dmesg once after every bootup check it
another link : Problems with SATA2 harddrive? | KernelTrap- Lakshmipathi.G
-------------------
FOSS India Award winning ext3fs Undelete tool and tutorials www.giis.co.in
First they criticize you,Then they laugh at you,Then they fight with you,Then you win. - M.K.Gandhi
-------------------
- 10-27-2009 #6Just Joined!
- Join Date
- Oct 2009
- Posts
- 3
Thank you very much, Lakshmipati. These links describe my problemm pretty well, and looks like this is an issue with some SMART capabilities.
But I still got no idea about how to fix it.
I made tests described in "Linux-Kernel Archive: FYI: strange libata EH lines in dmesg once after every bootup" and they produced very similar results.
1)# smartctl --smart=on
works fine
2)# smartctl --saveauto=on -d ata /dev/sda
produces the following error
3)# smartctl --offlineauto=on -d ata /dev/sda=== START OF ENABLE/DISABLE COMMANDS SECTION ===
Error SMART Enable Auto-save failed: Input/output error
Smartctl: SMART Enable Attribute Autosave Failed.
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
produces the following error
In my opinion the problem occurs when server tries to go into a sleeping mode or tries to move an HDD into a sleeping mode. Probably in this case Fedora tries to use either auto-save feature or offlineauto feature, or both of them, and as long as they are unavailable, everything crashes and the file system becomes inoperative.=== START OF ENABLE/DISABLE COMMANDS SECTION ===
Error SMART Enable Automatic Offline failed: Input/output error
Smartctl: SMART Enable Automatic Offline Failed.
Another thing which makes me think like that is the log of 5 latest disk errors:
Now the main question is: what should I do to make my OS running well?Error 7495 occurred at disk power-on lifetime: 14194 hours (591 days + 10 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
...
Error 7494 occurred at disk power-on lifetime: 14194 hours (591 days + 10 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
...
Error 7493 occurred at disk power-on lifetime: 14194 hours (591 days + 10 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
...
Error 7492 occurred at disk power-on lifetime: 14194 hours (591 days + 10 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
...
Error 7491 occurred at disk power-on lifetime: 14194 hours (591 days + 10 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
There are no definite answer to this question that will match my situation.
1) "Linux-Kernel Archive: FYI: strange libata EH lines in dmesg once after every bootup" advises to change startup scripts.
But in my case the problem appears not during startup, but after it, usually in several days after startup, so it clearly will not help.
2) "Problems with SATA2 harddrive? | KernelTrap" gives several advices
a) disable smartd (smart daemon) to resolve this problem
b) enable in BIOS S.M.A.R.T.
c) add "noapic" and "nosmp" to the kernel
Since "# smartctl --smart=on" succesfully works I don't thik (b) is the case.
I'm a little afraid to disable smartd, and I don't think it's a good idea at all, so (a) is probably not the best choice either.
First thing about (c) is that I'm not sure how exactly should I change boot parameters and I'm very afraid to totally break the system with my unqualified actions.
What do you think about (c)? Could it really help?
In fact I made the following:
# smartctl --smart=on --offlineauto=off --
and the output was:
Could these settings stop the malfunctions?=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMART Attribute Autosave Disabled.
SMART Automatic Offline Testing Disabled.
I personally doubt that, but I don't understand how all these things are working inside, when an HDD goes to a sleeping mode and probably core will read the settings and will not try to make actions which lead to crash?
Any ideas appreciated!
Thank you in advance!
- 10-27-2009 #7
I have used smartctl very rarely,following links provides more insight on it
Monitoring Hard Disks with SMART
Linux Harddisk Monitoring with SmartMonTools (smartctl)
I might be wrong ,I guess your hard disk might be started to fail-Just my assumption.
ACPI deals with advanced configuration and power interface - check this to know more about
kernel parameters.
http://www.kernel.org/pub/linux/kern...n_pdf/ch09.pdf- Lakshmipathi.G
-------------------
FOSS India Award winning ext3fs Undelete tool and tutorials www.giis.co.in
First they criticize you,Then they laugh at you,Then they fight with you,Then you win. - M.K.Gandhi
-------------------


Reply With Quote