Welcome to Linux Forums! With a comprehensive Linux Forum, information on various types of Linux software and many Linux Reviews articles, we have all the knowledge you need a click away, or accessible via our knowledgeable members.
Write an article for LinuxForums Today! Win Great Prizes!
Find the answer to your Linux question:
New to Linux Forums? Register here for free!
    Linux Forums > Your Distro > Mandriva Linux Help > 2007.0 hard crashes after 4 hours

Forgot Password?
 Mandriva Linux Help   For help and discussion about Mandriva (formally Mandrake) Linux.

Site Navigation
Linux Articles
Linux Forums
Linux Downloads
Linux Hosting
Free Magazines
Job Board
IRC Chat
RSS Feeds
Linux Forum Topics
Linux Forums
Your Distro
Linux Resources
GNU Linux Zone
The Community
Reply
 
Thread Tools Display Modes
Old 11-13-2006   #1 (permalink)
Just Joined!
 
Join Date: Nov 2006
Location: UK
Posts: 4
2007.0 hard crashes after 4 hours

2007.0 is reliably hard crashing after about 4 hours of uptime on my Acer Ferrari 3200. The syslog does not suggest any obvious cause. Quite often there was recent activity on the wireless network card (bcm4306, using bcm43xx driver), but not always, and usually not exactly at the moment that the machine crashed. For instance, the following was recorded in the syslog at the time of the last crash (items from 15:55:32 are start of reboot after hard reset):

Nov 13 15:52:26 localhost last message repeated 4 times
Nov 13 15:52:38 localhost dhclient: DHCPREQUEST on eth0 to 192.168.111.2 port 67
Nov 13 15:52:47 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out
Nov 13 15:52:47 localhost kernel: bcm43xx: Controller RESET (TX timeout) ...
Nov 13 15:52:47 localhost kernel: ACPI: PCI interrupt for device 0000:00:09.0 disabled
Nov 13 15:52:47 localhost kernel: ACPI: PCI Interrupt 0000:00:09.0[A] -> GSI 19 (level, low) -> IRQ 20
Nov 13 15:52:47 localhost kernel: bcm43xx: Chip ID 0x4306, rev 0x3
Nov 13 15:52:47 localhost kernel: bcm43xx: Number of cores: 5
Nov 13 15:52:47 localhost kernel: bcm43xx: Core 0: ID 0x800, rev 0x4, vendor 0x4243, enabled
Nov 13 15:52:47 localhost kernel: bcm43xx: Core 1: ID 0x812, rev 0x5, vendor 0x4243, disabled
Nov 13 15:52:47 localhost kernel: bcm43xx: Core 2: ID 0x80d, rev 0x2, vendor 0x4243, enabled
Nov 13 15:52:47 localhost kernel: bcm43xx: Core 3: ID 0x807, rev 0x2, vendor 0x4243, disabled
Nov 13 15:52:47 localhost kernel: bcm43xx: Core 4: ID 0x804, rev 0x9, vendor 0x4243, enabled
Nov 13 15:52:47 localhost kernel: bcm43xx: PHY connected
Nov 13 15:52:47 localhost kernel: bcm43xx: Detected PHY: Version: 2, Type 2, Revision 2
Nov 13 15:52:48 localhost kernel: bcm43xx: Detected Radio: ID: 2205017f (Manuf: 17f Ver: 2050 Rev: 2)
Nov 13 15:52:48 localhost kernel: bcm43xx: Radio turned off
Nov 13 15:52:48 localhost kernel: bcm43xx: Radio turned off
Nov 13 15:52:48 localhost kernel: bcm43xx: Controller restarted
Nov 13 15:52:50 localhost dhclient: DHCPREQUEST on eth0 to 192.168.111.2 port 67
Nov 13 15:53:30 localhost last message repeated 2 times
Nov 13 15:55:32 localhost syslogd 1.4.1: restart.
Nov 13 15:55:32 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 13 15:55:32 localhost kernel: Inspecting /boot/System.map-2.6.17-5mdv
Nov 13 15:55:32 localhost kernel: Loaded 21427 symbols from /boot/System.map-2.6.17-5mdv.

Despite messages, the wireless networking appears to be working, as is the wired NIC.

The laptop is AMD64-based (2800+), but I am using the 32-bit version of 2007.0 because of previous issues with plugins for Mozilla, compatibility of OpenOffice and Java, etc. Installation went smoothly. 512M of RAM. NetXtreme BCM5788 Gigabit Ethernet (tg3 driver).

Graphics card is "ATI Technologies Inc RV350 [Mobility Radeon 9600 M10]" according to lspci. I have tried with both fglrx and the xorg radeon driver, with the same result on both. Currently using the xorg driver, because Mandriva's fglrx driver has occasional artefacts on screen (small horizontal line following the pointer around).

I've tried with and without the new 3D desktop features, it doesn't make any difference. Same behaviour in KDE and Gnome.

I assume there isn't a similar problem to kat in 2006?

I am not using any particular application when it crashes. I usually have Firefox, Thunderbird and Konsole running, but it will hang regardless of which application I am using, or if I am not using it at all at the time.

Any ideas?
bgprior is offline  



Reply With Quote
Old 11-13-2006   #2 (permalink)
Linux Guru
 
fingal's Avatar
 
Join Date: Jul 2003
Location: Birmingham - UK
Posts: 1,539
Looks like a tough problem. I suggest probing more deeply into your system logs ... Anything which provides even a slight clue would be helpful. You might want to type dmesg and look at the output. These are kernel messages generated at boot time.

This won't necessarily point to the problem, but it's a starting point and it might help.

One more idea .... Try Googling on this:

"Controller RESET (TX timeout)"

and see what you find. This might give you some more clues. Include the quotation marks in the search to force an exact match.
__________________
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
fingal is offline   Reply With Quote
Old 11-13-2006   #3 (permalink)
/etc/init.d/moderator
 
bigtomrodney's Avatar
 
Join Date: Nov 2004
Location: Sunny South-East of Ireland
Posts: 6,076
Is your ACPI working correctly?
__________________
Registered Linux user #378740
New members read here / Forum Rules
#linuxforums on irc.freenode.net
bigtomrodney is offline   Reply With Quote
Old 11-14-2006   #4 (permalink)
Just Joined!
 
Join Date: Nov 2006
Location: UK
Posts: 4
dmesg, acpi and kernel panic

fingal: yeah, I know dmesg thanks, but there's no problem on boot that stands out. If you see my new information below, I guess the problem isn't with the wireless NIC, so I won't chase that error from the log, as the NIC works.

bigtomrodney: good point. I'll try the acpi=off, noapic and nolapic options that have been necessary with past incarnations with Mandr[iva|ake]. I don't want to lose acpi though, as this is an AMD64 laptop, and gets pretty hot without power management. But it's worth testing

I learnt something new just now by accident. I had left the machine shutting down (so I thought) a few hours ago. I don't know if anyone else is finding shutdown pretty eratic with 3D enabled, but it varies between shutting down correctly straight from KDE, restarting KDM from where I can shutdown, and dropping to a shell. In this case, it had dropped to a shell, where it had sat until it crashed. I therefore got the system message that is not normally visible when it crashes in X, and doesn't get logged. It said:

CPU0: Machine Check Exception: 0000000000000004
Bank 4: b200000000070f0f
Kernel panic - not syncing: CPU context corrupt

That sounds like very bad news. I've got no particular reason to say this other than past experience, but I guess noapic may be my best hope.

It would be nice if Mandriva worked 100% satisfactorily for a change, rather than the usual 95% brilliantly and 5% badly. Maybe it's time to try Ubuntu, but I go back to the very early days of Mandrake, and know how to get under its skin when there is a problem. Changing distros, you have to learn a whole new set of secrets.
bgprior is offline   Reply With Quote
Old 11-14-2006   #5 (permalink)
Just Joined!
 
Join Date: Nov 2006
Location: UK
Posts: 4
Kernel panic

I found this thread online:

http://www.linuxquestions.org/questi...d.php?t=278653

The following advice from that thread seems relevant:

Quote:
For future reference, in case anyone is googling this...

CPU0: Machine Check Exception: 0000000000000004
Bank 0: be00030020200137[0118000100900140] at 00000000fee00300
Kernel Panic - not syncing: CPU Context corrupt.

Im fairly sure that error is a MCE check gone bad. To get around it, type "linux nomce"
at your LILO prompt. Or you can recompile your kernel to exclude Machine
Check Exception. HTH..
From other messages, it seems this may be related to overheating of the CPU, which wouldn't surprise me, because the laptop gets as warm as a hot-water bottle where the CPU is located. That is why I want acpi. Unfortunately, cpufreqd seemed not to be working with this version of Mandriva (it was working in 2006), so I had to remove apmd and cpufreqd to replace them with kpowersave, which does seem to be working to scale the CPU. I've got it scaled down from 1.8 GHz to 0.8 GHz at the moment, but it's still bloody hot. That's just the nature of the laptop, I suppose. It's possible that this error is actually a sign that 2007.0 has improved on previous versions and is catching errors that previous versions weren't catching. Or it's being overfussy or getting the check wrong, I don't know. Anyway, I might as well chance frying my CPU by using the nomce flag, as I'll have to replace the laptop anyway if I can't get round this.

One thing makes me suspect it may be Mandriva's error rather than overheating: it lasts roughly 4 hours whether I fire it up from cold or reboot immediately after a crash. If it was overheating, surely it would crash much faster after a reboot. Could be wrong, but I think I'll put this down to another one of Mandriva's "endearing" little glitches. Time will tell if I'm right.
bgprior is offline   Reply With Quote
Old 11-15-2006   #6 (permalink)
Linux Guru
 
fingal's Avatar
 
Join Date: Jul 2003
Location: Birmingham - UK
Posts: 1,539
Quote:
Originally Posted by bgprior
It would be nice if Mandriva worked 100% satisfactorily for a change, rather than the usual 95% brilliantly and 5% badly. Maybe it's time to try Ubuntu, but I go back to the very early days of Mandrake, and know how to get under its skin when there is a problem. Changing distros, you have to learn a whole new set of secrets.
Lol - that's one of the most accurate assessments of Mandriva I've seen. I'm just the same ... I don't have the time to start from scratch again.
__________________
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
fingal is offline   Reply With Quote
Old 11-15-2006   #7 (permalink)
Just Joined!
 
Join Date: Nov 2006
Location: UK
Posts: 4
Bad news

The machine lasted longer before freezing today, since adding the nomce flag, but it crashed all the same about an hour ago. I might go back to 2006 for a while to see if I am getting the same behaviour, just in case I have developed a fault with the CPU coincidentally at the same time as installing 2007. That would be a big coincidence, but I want to eliminate the possibility before I start screaming that 2007 has a fatal flaw on my machine.
bgprior is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Free Magazines
A Newbie's Getting Started Guide to Linux
Learn the basics of the Linux operating systems. Get to know what it is all about, and familiarize yourself with the practical side. Basically, if you're a complete Linux newbie and looking for a quick and easy guide to get you started this is it.
subscribe
Run Your Own Web Server Using Linux & Apache - Free 191 Page Preview
Learn about everything you'll need to build and maintain your Linux servers, and to deploy Web applications to them.
subscribe
Open Source Security Myths Dispelled
Dispel the five major myths surrounding Open Source Security and gain the tools necessary to make a truly informed decision for your IT organization
subscribe
InformationWeek
InformationWeek is the only newsweekly you'll need to stay on top of the latest developments in information technology.
subscribe



All times are GMT. The time now is 12:19 PM.






© 2000 - - All Rights Reserved - Property of  MAS Media

Content Relevant URLs by vBSEO 3.3.1