Find the answer to your Linux question:
Results 1 to 2 of 2
Over the last month or so my CentOS server has been crashing for reasons I do not know. It has been running for over a year with regular yum updates ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Dec 2012
    Posts
    1

    Random Crashing


    Over the last month or so my CentOS server has been crashing for reasons I do not know. It has been running for over a year with regular yum updates without problems. The load on the server is perfectly normal with CPU usage at 5-6% and RAM usage at less than half of 32GB of RAM (multiple smaller game servers run off of this box). I am unsure if this is a software issue at all.

    I have pasted my /var/log/messages file around the time of my latest crash all the way up to the crash. Because I am a CentOS newb, this is gibberish to me, so I am curious if anything in the file points to a crash of some kind? Or if there are other logs I could check and paste? If not, it would lead me to believe there is a hardware issue or overheating.

    Here is the messages:

    Code:
    Dec 21 14:58:03 server1 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted)
    Dec 21 14:58:03 server1 kernel: Hardware name: X9SCL/X9SCM
    Dec 21 14:58:03 server1 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out
    Dec 21 14:58:03 server1 kernel: Modules linked in: fuse autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
    Dec 21 14:58:03 server1 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-279.14.1.el6.x86_64 #1
    Dec 21 14:58:03 server1 kernel: Call Trace:
    Dec 21 14:58:03 server1 kernel: <IRQ>  [<ffffffff8106b7b7>] ? warn_slowpath_common+0x87/0xc0
    Dec 21 14:58:03 server1 kernel: [<ffffffff8106b8a6>] ? warn_slowpath_fmt+0x46/0x50
    Dec 21 14:58:03 server1 kernel: [<ffffffff81459c0d>] ? dev_watchdog+0x26d/0x280
    Dec 21 14:58:03 server1 kernel: [<ffffffff8108caad>] ? insert_work+0x6d/0xb0
    Dec 21 14:58:03 server1 kernel: [<ffffffff814599a0>] ? dev_watchdog+0x0/0x280
    Dec 21 14:58:03 server1 kernel: [<ffffffff8107e937>] ? run_timer_softirq+0x197/0x340
    Dec 21 14:58:03 server1 kernel: [<ffffffff810a23c0>] ? tick_sched_timer+0x0/0xc0
    Dec 21 14:58:03 server1 kernel: [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30
    Dec 21 14:58:03 server1 kernel: [<ffffffff81073f61>] ? __do_softirq+0xc1/0x1e0
    Dec 21 14:58:03 server1 kernel: [<ffffffff81096d60>] ? hrtimer_interrupt+0x140/0x250
    Dec 21 14:58:03 server1 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
    Dec 21 14:58:03 server1 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
    Dec 21 14:58:03 server1 kernel: [<ffffffff81073d45>] ? irq_exit+0x85/0x90
    Dec 21 14:58:03 server1 kernel: [<ffffffff81506450>] ? smp_apic_timer_interrupt+0x70/0x9b
    Dec 21 14:58:03 server1 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
    Dec 21 14:58:03 server1 kernel: <EOI>  [<ffffffff812cddbe>] ? intel_idle+0xde/0x170
    Dec 21 14:58:03 server1 kernel: [<ffffffff812cdda1>] ? intel_idle+0xc1/0x170
    Dec 21 14:58:03 server1 kernel: [<ffffffff8109929d>] ? sched_clock_cpu+0xcd/0x110
    Dec 21 14:58:03 server1 kernel: [<ffffffff81407c27>] ? cpuidle_idle_call+0xa7/0x140
    Dec 21 14:58:03 server1 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
    Dec 21 14:58:03 server1 kernel: [<ffffffff814f754f>] ? start_secondary+0x22a/0x26d
    Dec 21 14:58:03 server1 kernel: ---[ end trace c6b419e0a29214c3 ]---
    Dec 21 14:58:03 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:58:03 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:58:03 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 14:58:04 server1 abrtd: Directory 'oops-2012-12-21-14:58:04-2219-0' creation detected
    Dec 21 14:58:04 server1 abrt-dump-oops: Reported 1 kernel oopses to Abrt
    Dec 21 14:58:04 server1 abrtd: Can't open file '/var/spool/abrt/oops-2012-12-21-14:58:04-2219-0/uid': No such file or directory
    Dec 21 14:58:06 server1 kernel: Bridge firewalling registered
    Dec 21 14:58:13 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:58:13 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:58:13 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 14:58:14 server1 abrtd: Sending an email...
    Dec 21 14:58:14 server1 abrtd: Email was sent to: root_localhost
    Dec 21 14:58:14 server1 abrtd: New problem directory /var/spool/abrt/oops-2012-12-21-14:58:04-2219-0, processing
    Dec 21 14:58:14 server1 abrtd: Can't open file '/var/spool/abrt/oops-2012-12-21-14:58:04-2219-0/uid': No such file or directory
    Dec 21 14:58:23 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:58:23 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:58:23 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 14:58:33 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:58:33 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:58:33 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 14:58:43 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:58:43 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:58:43 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 14:58:53 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:58:53 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:58:53 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 14:59:03 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:59:03 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:59:03 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 14:59:13 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:59:13 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:59:13 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 15:03:03 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 15:03:03 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 15:03:03 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Thanks in advance.

  2. #2
    Linux Guru
    Join Date
    Nov 2007
    Posts
    1,752
    You have an error in what appears to be the network scheduler code:

    Code:
    Dec 21 14:58:03 server1 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted)
    Dec 21 14:58:03 server1 kernel: Hardware name: X9SCL/X9SCM
    Dec 21 14:58:03 server1 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out
    Followed by lots of errors relating to the eth2 NIC:
    Code:
    Dec 21 14:58:23 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:58:23 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:58:23 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 14:58:33 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:58:33 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    Dec 21 14:58:33 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Dec 21 14:58:43 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
    Dec 21 14:58:43 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
    If you want a shotgun approach, replace the NIC.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •