Find the answer to your Linux question:
Results 1 to 9 of 9
Hi, I've been having some problems with my computer's hardware lately, I think it's not specifically a Linux problem, but hardware failing, so I come here for advice on what ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Oct 2012
    Posts
    9

    Sudden Kernel Panics


    Hi, I've been having some problems with my computer's hardware lately, I think it's not specifically a Linux problem, but hardware failing, so I come here for advice on what to do and confirm if it is really a hardware problem (since I may be wrong).

    Lately, I got these kernel panics:

    [13684.008253] [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b20000001040080f
    [13684.012010] [Hardware Error]: RIP !INEXACT! 00:<00000000b7232d91>
    [13684.012010] [Hardware Error]: TSC 22e92cc81d9b
    [13684.012010] [Hardware Error]: PROCESSOR 0:f47 TIME 1353772330 SOCKET 0 APIC 0 microcode 3
    [13684.012010] [Hardware Error]: Run the above through 'mcelog --ascii'
    [13684.012010] [Hardware Error]: Machine check: Processor context corrupt
    [13684.012010] Kernel panic - not syncing: Fatal Machine check
    [13684.012010] Pid: 26457, comm: sshd Tainted: P M O 3.2.0-4-686-pae #1 Debian 3.2.32-1
    [13684.012010] Call Trace:
    [13684.012010] [<c12bcb87>] ? panic+0x4d/0x144
    [13684.012010] [<c101900f>] ? mce_panic+0x132/0x15c
    [13684.012010] [<c1019716>] ? do_machine_check+0x474/0x5d9
    [13684.012010] [<c10cc601>] ? sys_read+0x4c/0x61
    [13684.012010] [<c10192a2>] ? mce_log+0xb6/0xb6
    [13684.012010] [<c12c229f>] ? error_code+0x67/0x6c
    [13684.012010] Rebooting in 30 seconds..

    In this panic, sshd is involved, but I've seen the same with audacious, chromium, Xorg and a lot of other programs.

    It happens when the CPU is under load and when completely idle (it just happened while I was away, and I left no programs on background or whatever).

    Random info:

    I'm running debian on a 7 year old Pentium D with a long story of abuse.

    The last hardware change made before symptoms appeared was adding a disk and symptoms appeared two months later (only ONCE, though), 1 - 2 months after the first panic, a lot more started appearing, now it happens roughly ONCE a day.

    All 3 HDD's are healthy, RAM is healthy. I've checked them both with memtest and S.M.A.R.T.

    I ran some "home-made" stress tests: 130 instances of cpuburn (burnP6) and even a CPU stresser from windows under wine, no crashes happened. The tests were 10 min long each.


    I believe it may be the motherboard or the cpu failing, mcelog gives this output:

    CPU 0: Machine Check Exception: 4 Bank 0: b20000001040080f
    TSC 22e92cc81d9b
    Hardware event. This is not a software error.
    CPU 0 BANK 0 TSC 22e92cc81d9b
    RIP !INEXACT! 00:b7232d91
    TIME 1353772330 Sat Nov 24 16:52:10 2012
    STATUS b20000001040080f MCGSTATUS 4
    CPUID Vendor Intel Family 15 Model 4
    PROCESSOR 0:f47 TIME 1353772330 SOCKET 0 APIC 0 microcode 3


    And that's all I know.

    Now, I'm gonna shut down this PC's until I can afford buying a new one, I can keep on with one of my other 3 machines and server, so no problem there.

    What I am asking is, what can I do to be sure that it's the CPU or a motherboard failing or if it's even a hardware issue (not software messing up registers from the CPU or whatever...)

    Thanks a lot for your attention.

  2. #2
    Linux Engineer
    Join Date
    Apr 2012
    Location
    Virginia, USA
    Posts
    910
    " Run the above through 'mcelog --ascii'"

    I think that will give you more info.

    Anyway, my first guess is PSU problem. Do you have another PSU laying around you could try?

  3. #3
    Just Joined!
    Join Date
    Oct 2012
    Posts
    9
    Quote Originally Posted by mizzle View Post
    " Run the above through 'mcelog --ascii'"

    I think that will give you more info.

    Anyway, my first guess is PSU problem. Do you have another PSU laying around you could try?
    Quoting myself:

    Quote Originally Posted by Imanol View Post
    I believe it may be the motherboard or the cpu failing, mcelog gives this output:

    CPU 0: Machine Check Exception: 4 Bank 0: b20000001040080f
    TSC 22e92cc81d9b
    Hardware event. This is not a software error.
    CPU 0 BANK 0 TSC 22e92cc81d9b
    RIP !INEXACT! 00:b7232d91
    TIME 1353772330 Sat Nov 24 16:52:10 2012
    STATUS b20000001040080f MCGSTATUS 4
    CPUID Vendor Intel Family 15 Model 4
    PROCESSOR 0:f47 TIME 1353772330 SOCKET 0 APIC 0 microcode 3
    I "don't" have a spare PSU, I actually do, but it doesn't have enough power to make the computer function correctly since I have a graphics card that needs quite some W to run (and if it were enough, the other one doesn't have the necessary connector for it).

    My actual PSU is a Nox 600W unit that is fairly new, maybe 3 years old I believe, but I can't seem to relate how can this be a PSU related issue...

    What can I do to test my PSU?

  4. $spacer_open
    $spacer_close
  5. #4
    Just Joined!
    Join Date
    Oct 2012
    Posts
    9
    And unfortunately, I don't have any spare graphic card nor my motherboard has an integrated one

  6. #5
    Administrator MikeTbob's Avatar
    Join Date
    Apr 2006
    Location
    Texas
    Posts
    7,864
    Quote Originally Posted by Imanol View Post
    Quoting myself:



    I "don't" have a spare PSU, I actually do, but it doesn't have enough power to make the computer function correctly since I have a graphics card that needs quite some W to run (and if it were enough, the other one doesn't have the necessary connector for it).

    My actual PSU is a Nox 600W unit that is fairly new, maybe 3 years old I believe, but I can't seem to relate how can this be a PSU related issue...

    What can I do to test my PSU?
    Open the PSU box and look for "Capacitor Plague", also check the MOBO for the same....it has happened to me and it's fairly common.
    Capacitor plague - Wikipedia, the free encyclopedia
    I have had PSU's go out after less than 2 years, again, not that uncommon.
    I think if this were my machine, I would disconnect everything that the machine does not need to boot, including the last drive, remove it and see if you still get the error messages. SMART has been known to report a disk as "Good" only to have the gosh darn thing stop working 3 days later.
    I do not respond to private messages asking for Linux help, Please keep it on the forums only.
    All new users please read this.** Forum FAQS. ** Adopt an unanswered post.

    I'd rather be lost at the lake than found at home.

  7. #6
    Just Joined!
    Join Date
    Oct 2012
    Posts
    9
    I haven't opened the PSU yet, but I can clearly see some of my motherboards capacitors are busted

    The drives are OK, I've checked every single parameter reported by smart and run tests. It seems that I'm going to need a new MB, but since I plan to upgrade to a new computer, I guess it's bye bye.

    What can I take into consideration to avoid this damage in the future? This computer was practically always on since it was used as a file storage server and main computer, the components weren't very good, and I suppose all that uptime was stretching out the component's life, but I can't really tell.

    Thanks a lot for your replies, I'll check the PSU, report back and mark this as (sadly) closed.

  8. #7
    Just Joined!
    Join Date
    Oct 2012
    Posts
    9
    As far as I can see, the PSU is okay

    The MB presents 1 clearly busted cap, and 3-4 that seem almost ready to pop or show a tiny amount of the brownish leak on it's inflated surface.

    Is it enough to make the CPU go nuts and cause panics??

  9. #8
    Linux Engineer
    Join Date
    Apr 2012
    Location
    Virginia, USA
    Posts
    910
    Quote Originally Posted by Imanol View Post
    As far as I can see, the PSU is okay

    The MB presents 1 clearly busted cap, and 3-4 that seem almost ready to pop or show a tiny amount of the brownish leak on it's inflated surface.

    Is it enough to make the CPU go nuts and cause panics??
    Yes. You might be able to just buy replacement capacitors and replace them, if you or a friend have a soldering iron. Or you might be able to grab a used motherboard for cheap. Tons of Pentium 4 computers on craigslist, etc.

  10. #9
    Just Joined!
    Join Date
    Oct 2012
    Posts
    9
    Quote Originally Posted by mizzle View Post
    Yes. You might be able to just buy replacement capacitors and replace them, if you or a friend have a soldering iron. Or you might be able to grab a used motherboard for cheap. Tons of Pentium 4 computers on craigslist, etc.
    I think I may be able to replace those capacitors since I have some skills with a solderer (I'm studying telecom engineering and I've earned a few soldering burns D)

    I'll see what I can do.

    Thanks a lot for your help! I'll mark this as solved.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •