Results 1 to 3 of 3
I set up a new amd64 gentoo server yesterday on an opteron but within a few hours of it being up I got a "Machine Check Exception" and the thing ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 09-21-2006 #1
Machine Check Exception on new Opteron server
I set up a new amd64 gentoo server yesterday on an opteron but within a few hours of it being up I got a "Machine Check Exception" and the thing froze up. I had to go to the the local console to see this and then had to hard reboot the machine. It wasn't really doing much at the time other than compiling a couple of things. The server is a dual-cpu dual-core machine (4 cores that is) with 8GB ram and 12 SCSI disks + 2 satas for OS.
The error from the console is below:
I have been googling around since yesterday but haven't found anything conclusiveCode:HARDWARE ERROR CPU 2: Machine Check Exception: 4 Bank 4: f615200133000813 TSC 5ac60e50b6a ADDR 1d251ec00 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check
I've tried running mcelog and got the following:
Code:# mcelog --k8 /dev/mcelog MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge TSC a4d0cd72d5a8 ADDR 23c400000 Northbridge GART error bit61 = error uncorrected TLB error 'generic transaction, level generic' STATUS a40000000005001b MCGSTATUS 0 MCE 1 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge TSC a56b2eba7649 ADDR 23c400000 Northbridge GART error bit61 = error uncorrected TLB error 'generic transaction, level generic' STATUS a40000000005001b MCGSTATUS 0 MCE 2 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge TSC a60591585bda ADDR 23c400000 Northbridge GART error bit61 = error uncorrected TLB error 'generic transaction, level generic' STATUS a40000000005001b MCGSTATUS 0 MCE 3 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge TSC a69ff2a635e8 ADDR 23c400000 Northbridge GART error bit61 = error uncorrected TLB error 'generic transaction, level generic' STATUS a40000000005001b MCGSTATUS 0 MCE 4 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge TSC a73a53f42ca9 ADDR 23c400000 Northbridge GART error bit61 = error uncorrected TLB error 'generic transaction, level generic' STATUS a40000000005001b MCGSTATUS 0 MCE 5 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge TSC a7d4b6934fdf ADDR 23c400000 Northbridge GART error bit61 = error uncorrected TLB error 'generic transaction, level generic' STATUS a40000000005001b MCGSTATUS 0 MCE 6 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 2 4 northbridge TSC a86f17e0a6a8 ADDR 191b0b000 Northbridge Chipkill ECC error Chipkill ECC syndrome = c12f bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node response, request didn't time out generic read mem transaction memory access, level generic' STATUS d417c000c1080a13 MCGSTATUS 0 MCE 7 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge TSC a86f17e0c311 ADDR 23c400000 Northbridge GART error bit61 = error uncorrected TLB error 'generic transaction, level generic' STATUS a40000000005001b MCGSTATUS 0
Does anybody know anything about this?
- 11-05-2006 #2
after replacing processors, then mobo, the problem was still occurring. After exhausting all other options including eliminating drive cards (since it happened under drive load) I eventually replaced the memory and the problem went away.
This was annoying because I ran extensive memtesting for the memory at the beginning and it showed no errors. But replacing the mem was the only thing that stopped the problem.
- 11-05-2006 #3Linux Guru
- Join Date
- Nov 2004
- Posts
- 6,110
Thanks for posting that back. I've never seen any of those errors. I'm just waiting for the Quads next year, so I'm glad it wasn't an issue with multiple cores (didn't think it would be but I'm still glad).


Reply With Quote
