Find the answer to your Linux question:
Results 1 to 6 of 6
Hello. The server machine reboots every 10 hours in average and I don't know why. Contents of /var/log/messages: Nov 22 06:31:09 sweeney kernel: e1000: eth0: e1000_watchdog: NIC Link is Down ...
  1. #1
    Just Joined!
    Join Date
    Nov 2010
    Posts
    4

    Server keeps restarting

    Hello.

    The server machine reboots every 10 hours in average and I don't know why.
    Contents of /var/log/messages:

    Nov 22 06:31:09 sweeney kernel: e1000: eth0: e1000_watchdog: NIC Link is Down
    Nov 22 06:34:34 sweeney syslogd 1.4.1: restart.
    Nov 22 06:34:34 sweeney kernel: klogd 1.4.1, log source = /proc/kmsg started.
    Nov 22 06:34:34 sweeney kernel: Linux version 2.6.22.14-72.fc6 (gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)) #1 SMP Wed Nov 21 15:12:59 EST 2007
    Nov 22 06:34:34 sweeney kernel: BIOS-provided physical RAM map:
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 0000000000000000 - 0000000000096400 (usable)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 0000000000096400 - 00000000000a0000 (reserved)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 0000000000100000 - 000000007ff60000 (usable)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 000000007ff60000 - 000000007ff69000 (ACPI data)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 000000007ff69000 - 000000007ff80000 (ACPI NVS)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 000000007ff80000 - 0000000080000000 (reserved)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
    Nov 22 06:34:34 sweeney kernel: BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
    Nov 22 06:34:34 sweeney kernel: 1151MB HIGHMEM available.
    Nov 22 06:34:34 sweeney kernel: 896MB LOWMEM available.
    And:

    Nov 22 07:05:05 sweeney smartd[3593]: Device: /dev/sda, 5 Currently unreadable (pending) sectors
    Nov 22 07:11:16 sweeney kernel: ata2.00: exception Emask 0x0 SAct 0x3ffff SErr 0x0 action 0x2 frozen
    Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/30:00:18:57:65/00:00:01:00:00/40 tag 0 cdb 0x0 data 24576 in
    Nov 22 07:11:16 sweeney kernel: res 40/00:08:88:ea:c9/00:00:07:00:00/40 Emask 0x4 (timeout)
    Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/18:08:60:57:65/00:00:01:00:00/40 tag 1 cdb 0x0 data 12288 in
    Nov 22 07:11:16 sweeney kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
    Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/08:10:48:58:65/00:00:01:00:00/40 tag 2 cdb 0x0 data 4096 in
    Nov 22 07:11:16 sweeney kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
    Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/10:18:58:58:65/00:00:01:00:00/40 tag 3 cdb 0x0 data 8192 in
    Nov 22 07:11:16 sweeney kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
    Nov 22 07:11:16 sweeney kernel: ata2.00: cmd 60/10:20:78:58:65/00:00:01:00:00/40 tag 4 cdb 0x0 data 8192 in
    Is the drive failing? Or is it something else?

  2. #2
    Trusted Penguin Roxoff's Avatar
    Join Date
    Aug 2005
    Location
    Nottingham, England
    Posts
    3,393
    It does look like the drive is failing physically, although it could be the ATA interface controller (but that's far less likely than the drive). Are you in a position to disconnect the drive and see if the problem goes away?

    It would be a sensible course of action to make sure you have good backups of the data on it before it degrades. You could always try it on a different ata port, but if it were me I'd change the drive as soon as possible. If this is an important server, you may want to ensure SMART is turned on in the BIOS and that the smartd monitor is installed and running on the server - at least then you'll get some warning if the drive starts to fail.
    Linux user #126863 - see http://linuxcounter.net/

  3. #3
    Just Joined!
    Join Date
    Nov 2010
    Posts
    4
    The server has RAID. Shouldn't that prevent reboots like this?

  4. #4
    Trusted Penguin Roxoff's Avatar
    Join Date
    Aug 2005
    Location
    Nottingham, England
    Posts
    3,393
    It might help, but if the hardware is failing, then things can get a bit unpredictable. I suggest you test and change out any faulty hardware; raid will help ensure you lose no data.
    Linux user #126863 - see http://linuxcounter.net/

  5. #5
    Just Joined!
    Join Date
    Nov 2010
    Posts
    4
    SMART passed. Now I'm even more puzzled.

    [root@sweeney mark08]# /usr/sbin/smartctl -H /dev/sda
    smartctl version 5.37 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    [root@sweeney log]# /usr/sbin/smartctl -H /dev/sdb
    smartctl version 5.37 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    [root@sweeney log]# /usr/sbin/smartctl -H /dev/sdc

    smartctl version 5.37 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen

    SMART Health Status: OK
    Last edited by drfragment; 11-30-2010 at 03:39 PM.

  6. #6
    Just Joined!
    Join Date
    Nov 2010
    Posts
    4
    It appears there was an issue with the UPS in the rack. UPS got replaced and it's been OK so far.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...