Results 1 to 6 of 6
Hello.
The server machine reboots every 10 hours in average and I don't know why.
Contents of /var/log/messages:
Nov 22 06:31:09 sweeney kernel: e1000: eth0: e1000_watchdog: NIC Link is Down
...
- 11-29-2010 #1Just Joined!
- Join Date
- Nov 2010
- Posts
- 4
Server keeps restarting
Hello.
The server machine reboots every 10 hours in average and I don't know why.
Contents of /var/log/messages:
And:Nov 22 06:31:09 sweeney kernel: e1000: eth0: e1000_watchdog: NIC Link is Down
Nov 22 06:34:34 sweeney syslogd 1.4.1: restart.
Nov 22 06:34:34 sweeney kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 22 06:34:34 sweeney kernel: Linux version 2.6.22.14-72.fc6 (gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)) #1 SMP Wed Nov 21 15:12:59 EST 2007
Nov 22 06:34:34 sweeney kernel: BIOS-provided physical RAM map:
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 0000000000000000 - 0000000000096400 (usable)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 0000000000096400 - 00000000000a0000 (reserved)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 0000000000100000 - 000000007ff60000 (usable)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 000000007ff60000 - 000000007ff69000 (ACPI data)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 000000007ff69000 - 000000007ff80000 (ACPI NVS)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 000000007ff80000 - 0000000080000000 (reserved)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
Nov 22 06:34:34 sweeney kernel: 1151MB HIGHMEM available.
Nov 22 06:34:34 sweeney kernel: 896MB LOWMEM available.
Is the drive failing? Or is it something else?Nov 22 07:05:05 sweeney smartd[3593]: Device: /dev/sda, 5 Currently unreadable (pending) sectors
Nov 22 07:11:16 sweeney kernel: ata2.00: exception Emask 0x0 SAct 0x3ffff SErr 0x0 action 0x2 frozen
Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/30:00:18:57:65/00:00:01:00:00/40 tag 0 cdb 0x0 data 24576 in
Nov 22 07:11:16 sweeney kernel: res 40/00:08:88:ea:c9/00:00:07:00:00/40 Emask 0x4 (timeout)
Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/18:08:60:57:65/00:00:01:00:00/40 tag 1 cdb 0x0 data 12288 in
Nov 22 07:11:16 sweeney kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/08:10:48:58:65/00:00:01:00:00/40 tag 2 cdb 0x0 data 4096 in
Nov 22 07:11:16 sweeney kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/10:18:58:58:65/00:00:01:00:00/40 tag 3 cdb 0x0 data 8192 in
Nov 22 07:11:16 sweeney kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/10:20:78:58:65/00:00:01:00:00/40 tag 4 cdb 0x0 data 8192 in
- 11-29-2010 #2
It does look like the drive is failing physically, although it could be the ATA interface controller (but that's far less likely than the drive). Are you in a position to disconnect the drive and see if the problem goes away?
It would be a sensible course of action to make sure you have good backups of the data on it before it degrades. You could always try it on a different ata port, but if it were me I'd change the drive as soon as possible. If this is an important server, you may want to ensure SMART is turned on in the BIOS and that the smartd monitor is installed and running on the server - at least then you'll get some warning if the drive starts to fail.Linux user #126863 - see http://linuxcounter.net/
- 11-29-2010 #3Just Joined!
- Join Date
- Nov 2010
- Posts
- 4
The server has RAID. Shouldn't that prevent reboots like this?
- 11-30-2010 #4
It might help, but if the hardware is failing, then things can get a bit unpredictable. I suggest you test and change out any faulty hardware; raid will help ensure you lose no data.
Linux user #126863 - see http://linuxcounter.net/
- 11-30-2010 #5Just Joined!
- Join Date
- Nov 2010
- Posts
- 4
SMART passed. Now I'm even more puzzled.
[root@sweeney mark08]# /usr/sbin/smartctl -H /dev/sda
smartctl version 5.37 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[root@sweeney log]# /usr/sbin/smartctl -H /dev/sdb
smartctl version 5.37 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[root@sweeney log]# /usr/sbin/smartctl -H /dev/sdc
smartctl version 5.37 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
SMART Health Status: OKLast edited by drfragment; 11-30-2010 at 03:39 PM.
- 12-02-2010 #6Just Joined!
- Join Date
- Nov 2010
- Posts
- 4
It appears there was an issue with the UPS in the rack. UPS got replaced and it's been OK so far.


Reply With Quote