Find the answer to your Linux question:
Results 1 to 7 of 7
2007.0 is reliably hard crashing after about 4 hours of uptime on my Acer Ferrari 3200. The syslog does not suggest any obvious cause. Quite often there was recent activity ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Nov 2006
    Location
    UK
    Posts
    4

    2007.0 hard crashes after 4 hours


    2007.0 is reliably hard crashing after about 4 hours of uptime on my Acer Ferrari 3200. The syslog does not suggest any obvious cause. Quite often there was recent activity on the wireless network card (bcm4306, using bcm43xx driver), but not always, and usually not exactly at the moment that the machine crashed. For instance, the following was recorded in the syslog at the time of the last crash (items from 15:55:32 are start of reboot after hard reset):

    Nov 13 15:52:26 localhost last message repeated 4 times
    Nov 13 15:52:38 localhost dhclient: DHCPREQUEST on eth0 to 192.168.111.2 port 67
    Nov 13 15:52:47 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out
    Nov 13 15:52:47 localhost kernel: bcm43xx: Controller RESET (TX timeout) ...
    Nov 13 15:52:47 localhost kernel: ACPI: PCI interrupt for device 0000:00:09.0 disabled
    Nov 13 15:52:47 localhost kernel: ACPI: PCI Interrupt 0000:00:09.0[A] -> GSI 19 (level, low) -> IRQ 20
    Nov 13 15:52:47 localhost kernel: bcm43xx: Chip ID 0x4306, rev 0x3
    Nov 13 15:52:47 localhost kernel: bcm43xx: Number of cores: 5
    Nov 13 15:52:47 localhost kernel: bcm43xx: Core 0: ID 0x800, rev 0x4, vendor 0x4243, enabled
    Nov 13 15:52:47 localhost kernel: bcm43xx: Core 1: ID 0x812, rev 0x5, vendor 0x4243, disabled
    Nov 13 15:52:47 localhost kernel: bcm43xx: Core 2: ID 0x80d, rev 0x2, vendor 0x4243, enabled
    Nov 13 15:52:47 localhost kernel: bcm43xx: Core 3: ID 0x807, rev 0x2, vendor 0x4243, disabled
    Nov 13 15:52:47 localhost kernel: bcm43xx: Core 4: ID 0x804, rev 0x9, vendor 0x4243, enabled
    Nov 13 15:52:47 localhost kernel: bcm43xx: PHY connected
    Nov 13 15:52:47 localhost kernel: bcm43xx: Detected PHY: Version: 2, Type 2, Revision 2
    Nov 13 15:52:48 localhost kernel: bcm43xx: Detected Radio: ID: 2205017f (Manuf: 17f Ver: 2050 Rev: 2)
    Nov 13 15:52:48 localhost kernel: bcm43xx: Radio turned off
    Nov 13 15:52:48 localhost kernel: bcm43xx: Radio turned off
    Nov 13 15:52:48 localhost kernel: bcm43xx: Controller restarted
    Nov 13 15:52:50 localhost dhclient: DHCPREQUEST on eth0 to 192.168.111.2 port 67
    Nov 13 15:53:30 localhost last message repeated 2 times
    Nov 13 15:55:32 localhost syslogd 1.4.1: restart.
    Nov 13 15:55:32 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.
    Nov 13 15:55:32 localhost kernel: Inspecting /boot/System.map-2.6.17-5mdv
    Nov 13 15:55:32 localhost kernel: Loaded 21427 symbols from /boot/System.map-2.6.17-5mdv.

    Despite messages, the wireless networking appears to be working, as is the wired NIC.

    The laptop is AMD64-based (2800+), but I am using the 32-bit version of 2007.0 because of previous issues with plugins for Mozilla, compatibility of OpenOffice and Java, etc. Installation went smoothly. 512M of RAM. NetXtreme BCM5788 Gigabit Ethernet (tg3 driver).

    Graphics card is "ATI Technologies Inc RV350 [Mobility Radeon 9600 M10]" according to lspci. I have tried with both fglrx and the xorg radeon driver, with the same result on both. Currently using the xorg driver, because Mandriva's fglrx driver has occasional artefacts on screen (small horizontal line following the pointer around).

    I've tried with and without the new 3D desktop features, it doesn't make any difference. Same behaviour in KDE and Gnome.

    I assume there isn't a similar problem to kat in 2006?

    I am not using any particular application when it crashes. I usually have Firefox, Thunderbird and Konsole running, but it will hang regardless of which application I am using, or if I am not using it at all at the time.

    Any ideas?

  2. #2
    Linux Guru fingal's Avatar
    Join Date
    Jul 2003
    Location
    Birmingham - UK
    Posts
    1,539
    Looks like a tough problem. I suggest probing more deeply into your system logs ... Anything which provides even a slight clue would be helpful. You might want to type dmesg and look at the output. These are kernel messages generated at boot time.

    This won't necessarily point to the problem, but it's a starting point and it might help.

    One more idea .... Try Googling on this:

    "Controller RESET (TX timeout)"

    and see what you find. This might give you some more clues. Include the quotation marks in the search to force an exact match.
    I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso

  3. #3
    Linux Guru bigtomrodney's Avatar
    Join Date
    Nov 2004
    Location
    Ireland
    Posts
    6,133
    Is your ACPI working correctly?

  4. $spacer_open
    $spacer_close
  5. #4
    Just Joined!
    Join Date
    Nov 2006
    Location
    UK
    Posts
    4

    dmesg, acpi and kernel panic

    fingal: yeah, I know dmesg thanks, but there's no problem on boot that stands out. If you see my new information below, I guess the problem isn't with the wireless NIC, so I won't chase that error from the log, as the NIC works.

    bigtomrodney: good point. I'll try the acpi=off, noapic and nolapic options that have been necessary with past incarnations with Mandr[iva|ake]. I don't want to lose acpi though, as this is an AMD64 laptop, and gets pretty hot without power management. But it's worth testing

    I learnt something new just now by accident. I had left the machine shutting down (so I thought) a few hours ago. I don't know if anyone else is finding shutdown pretty eratic with 3D enabled, but it varies between shutting down correctly straight from KDE, restarting KDM from where I can shutdown, and dropping to a shell. In this case, it had dropped to a shell, where it had sat until it crashed. I therefore got the system message that is not normally visible when it crashes in X, and doesn't get logged. It said:

    CPU0: Machine Check Exception: 0000000000000004
    Bank 4: b200000000070f0f
    Kernel panic - not syncing: CPU context corrupt

    That sounds like very bad news. I've got no particular reason to say this other than past experience, but I guess noapic may be my best hope.

    It would be nice if Mandriva worked 100% satisfactorily for a change, rather than the usual 95% brilliantly and 5% badly. Maybe it's time to try Ubuntu, but I go back to the very early days of Mandrake, and know how to get under its skin when there is a problem. Changing distros, you have to learn a whole new set of secrets.

  6. #5
    Just Joined!
    Join Date
    Nov 2006
    Location
    UK
    Posts
    4

    Kernel panic

    I found this thread online:

    http://www.linuxquestions.org/questi...d.php?t=278653

    The following advice from that thread seems relevant:

    For future reference, in case anyone is googling this...

    CPU0: Machine Check Exception: 0000000000000004
    Bank 0: be00030020200137[0118000100900140] at 00000000fee00300
    Kernel Panic - not syncing: CPU Context corrupt.

    Im fairly sure that error is a MCE check gone bad. To get around it, type "linux nomce"
    at your LILO prompt. Or you can recompile your kernel to exclude Machine
    Check Exception. HTH..
    From other messages, it seems this may be related to overheating of the CPU, which wouldn't surprise me, because the laptop gets as warm as a hot-water bottle where the CPU is located. That is why I want acpi. Unfortunately, cpufreqd seemed not to be working with this version of Mandriva (it was working in 2006), so I had to remove apmd and cpufreqd to replace them with kpowersave, which does seem to be working to scale the CPU. I've got it scaled down from 1.8 GHz to 0.8 GHz at the moment, but it's still bloody hot. That's just the nature of the laptop, I suppose. It's possible that this error is actually a sign that 2007.0 has improved on previous versions and is catching errors that previous versions weren't catching. Or it's being overfussy or getting the check wrong, I don't know. Anyway, I might as well chance frying my CPU by using the nomce flag, as I'll have to replace the laptop anyway if I can't get round this.

    One thing makes me suspect it may be Mandriva's error rather than overheating: it lasts roughly 4 hours whether I fire it up from cold or reboot immediately after a crash. If it was overheating, surely it would crash much faster after a reboot. Could be wrong, but I think I'll put this down to another one of Mandriva's "endearing" little glitches. Time will tell if I'm right.

  7. #6
    Linux Guru fingal's Avatar
    Join Date
    Jul 2003
    Location
    Birmingham - UK
    Posts
    1,539
    Quote Originally Posted by bgprior
    It would be nice if Mandriva worked 100% satisfactorily for a change, rather than the usual 95% brilliantly and 5% badly. Maybe it's time to try Ubuntu, but I go back to the very early days of Mandrake, and know how to get under its skin when there is a problem. Changing distros, you have to learn a whole new set of secrets.
    Lol - that's one of the most accurate assessments of Mandriva I've seen. I'm just the same ... I don't have the time to start from scratch again.
    I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso

  8. #7
    Just Joined!
    Join Date
    Nov 2006
    Location
    UK
    Posts
    4

    Bad news

    The machine lasted longer before freezing today, since adding the nomce flag, but it crashed all the same about an hour ago. I might go back to 2006 for a while to see if I am getting the same behaviour, just in case I have developed a fault with the CPU coincidentally at the same time as installing 2007. That would be a big coincidence, but I want to eliminate the possibility before I start screaming that 2007 has a fatal flaw on my machine.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •