Debian squeeze random lock ups, need help reading kernel dmesg errors
I have a new box working as a light duty file and mysql server for a small office.
Mainboard is Asus M5A97 R2.0, CPU is an AMD FX-8350 4.0GHz 8 core 4x8gb RAM with two linux RAID 1 arrays.
Running Linux server 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 with gdm disabled (no X)
Box works great except that it locks up hard every 1 to 3 days requiring a power reset to get it back. When it locks up there is no activity.
No NumLock, No console message, no network and no log of the cause. It is just idling when this happens. I have captured top ouput and there is no hint of
any amount of CPU or mem usage. I have swapped the RAM with another box but the lockups remain. Memtest shows no problem.
There are errors in dmesg (kern.log) on boot that I don't understand.
Code:
Jan 23 20:17:10 server kernel: [ 0.995804] kobject_add_internal failed for threshold_bank1 with -EEXIST, don't try to register things with the same name in the same directory.
Jan 23 20:17:10 server kernel: [ 0.995871] Pid: 1, comm: swapper Tainted: G W 2.6.32-5-amd64 #1
Jan 23 20:17:10 server kernel: [ 0.995872] Call Trace:
Jan 23 20:17:10 server kernel: [ 0.995873] [<ffffffff8118fd14>] ? kobject_add_internal+0x16e/0x181
Jan 23 20:17:10 server kernel: [ 0.995875] [<ffffffff8118fed3>] ? kobject_add+0x74/0x7c
Jan 23 20:17:10 server kernel: [ 0.995877] [<ffffffff81195b7e>] ? sprintf+0x51/0x59
Jan 23 20:17:10 server kernel: [ 0.995879] [<ffffffff810e84fd>] ? __kmalloc+0x12f/0x141
Jan 23 20:17:10 server kernel: [ 0.995880] [<ffffffff8118f9e8>] ? kobject_create+0x10/0x2c
Jan 23 20:17:10 server kernel: [ 0.995882] [<ffffffff8118f998>] ? kobject_init+0x42/0x82
Jan 23 20:17:10 server kernel: [ 0.995883] [<ffffffff8118f9ff>] ? kobject_create+0x27/0x2c
Jan 23 20:17:10 server kernel: [ 0.995885] [<ffffffff8118ff09>] ? kobject_create_and_add+0x2e/0x5b
Jan 23 20:17:10 server kernel: [ 0.995887] [<ffffffff812f5487>] ? threshold_create_bank+0x16f/0x287
Jan 23 20:17:10 server kernel: [ 0.995889] [<ffffffff81523d85>] ? threshold_init_device+0x40/0x86
Jan 23 20:17:10 server kernel: [ 0.995892] [<ffffffff81523d45>] ? threshold_init_device+0x0/0x86
Jan 23 20:17:10 server kernel: [ 0.995894] [<ffffffff8100a065>] ? do_one_initcall+0x64/0x174
Jan 23 20:17:10 server kernel: [ 0.995895] [<ffffffff8151c677>] ? kernel_init+0x158/0x1ae
Jan 23 20:17:10 server kernel: [ 0.995897] [<ffffffff8151c140>] ? early_idt_handler+0x0/0x71
Jan 23 20:17:10 server kernel: [ 0.995899] [<ffffffff81011baa>] ? child_rip+0xa/0x20
Jan 23 20:17:10 server kernel: [ 0.995900] [<ffffffff8151c140>] ? early_idt_handler+0x0/0x71
Jan 23 20:17:10 server kernel: [ 0.995902] [<ffffffff8151c51f>] ? kernel_init+0x0/0x1ae
Jan 23 20:17:10 server kernel: [ 0.995903] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
Jan 23 20:17:10 server kernel: [ 0.995904] kobject_create_and_add: kobject_add error: -17
and these:
Code:
Jan 23 20:17:10 server kernel: [ 0.995105] ------------[ cut here ]------------
Jan 23 20:17:10 server kernel: [ 0.995106] WARNING: at /build/buildd-linux-2.6_2.6.32-46-amd64-_ApuPc/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/dir.c:491 sysfs_add_one+0xcc/0xe4()
Jan 23 20:17:10 server kernel: [ 0.995108] Hardware name: To be filled by O.E.M.
Jan 23 20:17:10 server kernel: [ 0.995108] sysfs: cannot create duplicate filename '/devices/system/machinecheck/machinecheck6/threshold_bank2'
Jan 23 20:17:10 server kernel: [ 0.995109] Modules linked in:
Jan 23 20:17:10 server kernel: [ 0.995111] Pid: 1, comm: swapper Tainted: G W 2.6.32-5-amd64 #1
Jan 23 20:17:10 server kernel: [ 0.995111] Call Trace:
Jan 23 20:17:10 server kernel: [ 0.995113] [<ffffffff811404ff>] ? sysfs_add_one+0xcc/0xe4
Jan 23 20:17:10 server kernel: [ 0.995114] [<ffffffff811404ff>] ? sysfs_add_one+0xcc/0xe4
Jan 23 20:17:10 server kernel: [ 0.995116] [<ffffffff8104df38>] ? warn_slowpath_common+0x77/0xa3
Jan 23 20:17:10 server kernel: [ 0.995117] [<ffffffff8104dfc0>] ? warn_slowpath_fmt+0x51/0x59
Jan 23 20:17:10 server kernel: [ 0.995119] [<ffffffff8114042b>] ? sysfs_pathname+0x35/0x3d
Jan 23 20:17:10 server kernel: [ 0.995120] [<ffffffff8114042b>] ? sysfs_pathname+0x35/0x3d
Jan 23 20:17:10 server kernel: [ 0.995122] [<ffffffff8114042b>] ? sysfs_pathname+0x35/0x3d
Jan 23 20:17:10 server kernel: [ 0.995123] [<ffffffff8114042b>] ? sysfs_pathname+0x35/0x3d
Jan 23 20:17:10 server kernel: [ 0.995125] [<ffffffff811404ff>] ? sysfs_add_one+0xcc/0xe4
Jan 23 20:17:10 server kernel: [ 0.995126] [<ffffffff81140a7b>] ? create_dir+0x4f/0x7c
Jan 23 20:17:10 server kernel: [ 0.995128] [<ffffffff81140add>] ? sysfs_create_dir+0x35/0x4a
Jan 23 20:17:10 server kernel: [ 0.995129] [<ffffffff8118fb3b>] ? kobject_get+0x12/0x17
Jan 23 20:17:10 server kernel: [ 0.995131] [<ffffffff8118fc71>] ? kobject_add_internal+0xcb/0x181
Jan 23 20:17:10 server kernel: [ 0.995132] [<ffffffff8118fed3>] ? kobject_add+0x74/0x7c
Jan 23 20:17:10 server kernel: [ 0.995134] [<ffffffff81195b7e>] ? sprintf+0x51/0x59
Jan 23 20:17:10 server kernel: [ 0.995136] [<ffffffff810e84fd>] ? __kmalloc+0x12f/0x141
Jan 23 20:17:10 server kernel: [ 0.995137] [<ffffffff8118f9e8>] ? kobject_create+0x10/0x2c
Jan 23 20:17:10 server kernel: [ 0.995139] [<ffffffff8118f998>] ? kobject_init+0x42/0x82
Jan 23 20:17:10 server kernel: [ 0.995140] [<ffffffff8118f9ff>] ? kobject_create+0x27/0x2c
Jan 23 20:17:10 server kernel: [ 0.995142] [<ffffffff8118ff09>] ? kobject_create_and_add+0x2e/0x5b
Jan 23 20:17:10 server kernel: [ 0.995144] [<ffffffff812f5487>] ? threshold_create_bank+0x16f/0x287
Jan 23 20:17:10 server kernel: [ 0.995146] [<ffffffff81523d85>] ? threshold_init_device+0x40/0x86
Jan 23 20:17:10 server kernel: [ 0.995148] [<ffffffff81523d45>] ? threshold_init_device+0x0/0x86
Jan 23 20:17:10 server kernel: [ 0.995149] [<ffffffff8100a065>] ? do_one_initcall+0x64/0x174
Jan 23 20:17:10 server kernel: [ 0.995151] [<ffffffff8151c677>] ? kernel_init+0x158/0x1ae
Jan 23 20:17:10 server kernel: [ 0.995152] [<ffffffff8151c140>] ? early_idt_handler+0x0/0x71
Jan 23 20:17:10 server kernel: [ 0.995154] [<ffffffff81011baa>] ? child_rip+0xa/0x20
Jan 23 20:17:10 server kernel: [ 0.995155] [<ffffffff8151c140>] ? early_idt_handler+0x0/0x71
Jan 23 20:17:10 server kernel: [ 0.995157] [<ffffffff8151c51f>] ? kernel_init+0x0/0x1ae
Jan 23 20:17:10 server kernel: [ 0.995158] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
Jan 23 20:17:10 server kernel: [ 0.995159] ---[ end trace 57f7151f6a5def1b ]---
I get the odd one of these during the day but not around lock up time:
Code:
Jan 23 14:25:09 server kernel: [66886.228349] invalid opcode: 0000 [#1] SMP
Jan 23 14:25:09 server kernel: [66886.228962] last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/host5/target5:0:0/5:0:0:0/model
Jan 23 14:25:09 server kernel: [66886.229595] CPU 4
Jan 23 14:25:09 server kernel: [66886.230192] Modules linked in: sco bridge stp bnep rfcomm parport_pc l2cap ppdev lp bluetooth parport rfkill cpufreq_conservative cpufreq_userspace cpufreq_powersave cpufreq_stats nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs binfmt_misc fuse ext3 jbd loop snd_hda_codec_realtek nouveau ttm drm_kms_helper drm snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq i2c_piix4 i2c_algo_bit snd_timer snd_seq_device i2c_core pcspkr evdev snd soundcore snd_page_alloc wmi button processor ext4 mbcache jbd2 crc16 dm_mod raid1 md_mod sd_mod crc_t10dif ahci r8169 libata mii ohci_hcd ehci_hcd xhci scsi_mod usbcore nls_base thermal thermal_sys [last unloaded: scsi_wait_scan]
Jan 23 14:25:09 server kernel: [66886.231768] Pid: 27364, comm: ps Tainted: G W 2.6.32-5-amd64 #1 To be filled by O.E.M.
Jan 23 14:25:09 server kernel: [66886.231768] RIP: 0010:[<ffffffff81056de0>] [<ffffffff81056de0>] ptrace_may_access+0x2d/0x37
Jan 23 14:25:09 server kernel: [66886.231768] RSP: 0018:ffff8803f24ffb60 EFLAGS: 00010206
Jan 23 14:25:09 server kernel: [66886.231768] RAX: 0000000000000000 RBX: ffff88043c108710 RCX: ffff8804341bcd80
Jan 23 14:25:09 server kernel: [66886.231768] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88043c108710
Jan 23 14:25:09 server kernel: [66886.231768] RBP: ffff880438844030 R08: ffff88043d488000 R09: 00007fc78ff065b6
Jan 23 14:25:09 server kernel: [66886.231768] R10: 0000000000000000 R11: ffffffff8115380d R12: 0000000000000001
Jan 23 14:25:09 server kernel: [66886.231768] R13: ffff88043c108c4c R14: ffff88043ae14580 R15: ffffffff81494500
Jan 23 14:25:09 server kernel: [66886.231768] FS: 00007fc790327700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
Jan 23 14:25:09 server kernel: [66886.231768] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 23 14:25:09 server kernel: [66886.231768] CR2: 00007fc78fbda772 CR3: 00000003f25c8000 CR4: 00000000000406e0
Jan 23 14:25:09 server kernel: [66886.231768] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 23 14:25:09 server kernel: [66886.231768] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 23 14:25:09 server kernel: [66886.231768] Process ps (pid: 27364, threadinfo ffff8803f24fe000, task ffff88043c108710)
Jan 23 14:25:09 server kernel: [66886.231768] Stack:
Jan 23 14:25:09 server kernel: [66886.231768] 00000000fffffffd ffff88043c108710 ffffffff81135f6f 000000007fffffff
Jan 23 14:25:09 server kernel: [66886.231768] <0> ffffffff810fe4a3 ffff8803353189c0 ffffffff81117a1c ffff8803353189c0
Jan 23 14:25:09 server kernel: [66886.231768] <0> ffffffff810fe53d ffff880438844a50 ffffffff811345c2 ffff8803353189c0
Jan 23 14:25:09 server kernel: [66886.231768] Call Trace:
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81135f6f>] ? do_task_stat+0x7f/0x98c
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff810fe4a3>] ? __d_instantiate+0x54/0xbd
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81117a1c>] ? inotify_d_instantiate+0x12/0x39
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff810fe53d>] ? d_instantiate+0x31/0x47
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff811345c2>] ? pid_revalidate+0x74/0x8a
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81101065>] ? inode_init_always+0x109/0x1aa
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81101143>] ? alloc_inode+0x3d/0x74
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff810fe4a3>] ? __d_instantiate+0x54/0xbd
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81117a1c>] ? inotify_d_instantiate+0x12/0x39
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff811320a5>] ? task_dumpable+0x23/0x34
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff811345c2>] ? pid_revalidate+0x74/0x8a
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff811346fd>] ? proc_pident_instantiate+0x85/0x9a
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81134844>] ? proc_pident_lookup+0x92/0xa2
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff811061b1>] ? seq_open+0xa5/0xc3
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81133904>] ? proc_single_show+0x0/0x69
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff811062ce>] ? single_open+0x9c/0xc6
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff8113204d>] ? proc_single_open+0x0/0x35
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff8113206a>] ? proc_single_open+0x1d/0x35
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff810ed99d>] ? __dentry_open+0x1c4/0x2bf
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff810f91d7>] ? do_filp_open+0x4e4/0x94b
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81133952>] ? proc_single_show+0x4e/0x69
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81105f38>] ? seq_read+0x1b4/0x388
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff810efa84>] ? vfs_read+0xa6/0xff
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff810efb99>] ? sys_read+0x45/0x6e
Jan 23 14:25:09 server kernel: [66886.231768] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Jan 23 14:25:09 server kernel: [66886.231768] Code: 4c 8d af 3c 05 00 00 41 54 41 89 f4 53 48 89 fb 4c 89 ef e8 26 60 2a 00 44 89 e6 48 89 df e8 d6 fd ff ff 66 ff 83 3c 05 00 00 5b <c5> c0 41 5c 0f 94 c0 41 5d c3 55 89 f5 53 48 89 fb 48 c7 c7 00
Jan 23 14:25:09 server kernel: [66886.231768] RIP [<ffffffff81056de0>] ptrace_may_access+0x2d/0x37
Jan 23 14:25:09 server kernel: [66886.231768] RSP <ffff8803f24ffb60>
Jan 23 14:25:09 server kernel: [66886.280908] ---[ end trace 57f7151f6a5def22 ]---
I also see some of this going on, not sure why:
Code:
Jan 23 11:18:45 server kernel: [55702.144593] r8169 0000:02:00.0: eth0: link down
Jan 23 11:18:47 server kernel: [55703.706772] r8169 0000:02:00.0: eth0: link up
Jan 23 11:18:50 server kernel: [55707.064259] r8169 0000:02:00.0: eth0: link down
Jan 23 11:18:52 server kernel: [55708.593728] r8169 0000:02:00.0: eth0: link up
Jan 23 11:18:54 server kernel: [55710.832990] r8169 0000:02:00.0: eth0: link down
Jan 23 11:18:55 server kernel: [55712.494937] r8169 0000:02:00.0: eth0: link up
Jan 23 11:22:18 server kernel: [55915.272050] r8169 0000:02:00.0: eth0: link down
Jan 23 11:22:20 server kernel: [55916.797826] r8169 0000:02:00.0: eth0: link up
Jan 23 11:22:21 server kernel: [55917.832494] r8169 0000:02:00.0: eth0: link down
Jan 23 11:22:22 server kernel: [55919.409107] r8169 0000:02:00.0: eth0: link up
Jan 23 11:23:43 server kernel: [55999.973313] r8169 0000:02:00.0: eth0: link down
Jan 23 11:23:45 server kernel: [56001.603652] r8169 0000:02:00.0: eth0: link up
Jan 23 11:32:28 server kernel: [56525.305703] r8169 0000:02:00.0: eth0: link down
Jan 23 11:32:30 server kernel: [56526.925350] r8169 0000:02:00.0: eth0: link up
Jan 23 11:36:19 server kernel: [56755.929709] r8169 0000:02:00.0: eth0: link down
Jan 23 11:36:21 server kernel: [56757.497332] r8169 0000:02:00.0: eth0: link up
Jan 23 11:44:59 server kernel: [57275.705926] r8169 0000:02:00.0: eth0: link down
Jan 23 11:45:00 server kernel: [57277.261963] r8169 0000:02:00.0: eth0: link up
Jan 23 11:54:25 server kernel: [57842.162126] r8169 0000:02:00.0: eth0: link down
Jan 23 11:54:27 server kernel: [57843.743382] r8169 0000:02:00.0: eth0: link up
Jan 23 11:54:30 server kernel: [57847.219713] r8169 0000:02:00.0: eth0: link down
Jan 23 11:54:32 server kernel: [57848.787665] r8169 0000:02:00.0: eth0: link up
The previous box was squeeze 32 bit and ran without a problem.
Could this newer board and CPU just require a newer kernel? Any suggestions of what to try next?
I previously posted this question at Debian User Forums/Kernel but I think it may be a general kernel problem and not Debian specific.
TIA
Glenn