Results 1 to 10 of 29
I am running CentOS 5.7, it is crashing almost every day and nothing is indicated in the /var/log/messages.
It stops responding to ping requests and there is no network traffic ...
- 11-04-2011 #1Just Joined!
- Join Date
- Oct 2011
- Posts
- 12
CentOS 5 Crashing regularly, nothing in logs
I am running CentOS 5.7, it is crashing almost every day and nothing is indicated in the /var/log/messages.
It stops responding to ping requests and there is no network traffic to and from it. The system does not present these problems when running in a vm, only when running directly on the hardware. I have run this system on various hardware configurations and it happens only on the hardware systems.
I am out of ideas as to what the problem can be. I have googled everything i can think of related to crashing and network problems.
Here is the a copy of the entries in the messages /var/log/messages file from one of the incidents:
Nov 3 12:26:59 localhost smartd[1984]: Device: /dev/hdb, opened
Nov 3 12:27:00 localhost smartd[1984]: Device: /dev/hdb, found in smartd database.
Nov 3 12:27:00 localhost smartd[1984]: Device: /dev/hdb, is SMART capable. Adding to "monitor" list.
Nov 3 12:27:00 localhost smartd[1984]: Device: /dev/hdd, opened
Nov 3 12:27:00 localhost smartd[1984]: Device: /dev/hdd, packet devices [this device CD/DVD] not SMART capable
Nov 3 12:27:00 localhost smartd[1984]: Monitoring 1 ATA and 0 SCSI devices
Nov 3 12:27:01 localhost smartd[1989]: smartd has fork()ed into background mode. New PID=1989.
Nov 3 12:32:41 localhost nmbd[1968]: [2011/11/03 12:32:41, 0] nmbd/nmbd_become_lmb.c:become_local_master_stage2(396)
Nov 3 12:32:41 localhost nmbd[1968]: *****
Nov 3 12:32:41 localhost nmbd[1968]:
Nov 3 12:32:41 localhost nmbd[1968]: Samba name server LOCALHOST is now a local master browser for workgroup MYGROUP on subnet xxx.xx.x.xxx
Nov 3 12:32:41 localhost nmbd[1968]:
Nov 3 12:32:41 localhost nmbd[1968]: *****
Nov 4 11:55:55 localhost syslogd 1.4.1: restart.
Nov 4 11:55:55 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 4 11:55:55 localhost kernel: Linux version 2.6.18-274.3.1.el5 (mockbuilder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)) #1 SMP Tue Sep 6 20:14:03 EDT 2011
Nov 4 11:55:55 localhost kernel: BIOS-provided physical RAM map:
Nov 4 11:55:55 localhost kernel: BIOS-e820: 0000000000010000 - 000000000009fc00 (usable)
Nov 4 11:55:55 localhost kernel: BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
Nov 4 11:55:55 localhost kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Nov 4 11:55:55 localhost kernel: BIOS-e820: 0000000000100000 - 000000001dff0000 (usable)
Nov 4 11:55:55 localhost kernel: BIOS-e820: 000000001dff0000 - 000000001dff3000 (ACPI NVS)
Nov 4 11:55:55 localhost kernel: BIOS-e820: 000000001dff3000 - 000000001e000000 (ACPI data)
Nov 4 11:55:55 localhost kernel: BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Nov 4 11:55:55 localhost kernel: 0MB HIGHMEM available.
Nov 4 11:55:55 localhost kernel: 479MB LOWMEM available.
Nov 4 11:55:55 localhost kernel: found SMP MP-table at 000f4bc0
Nov 4 11:55:55 localhost kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
Nov 4 11:55:55 localhost kernel: disabling kdump
Nov 4 11:55:55 localhost kernel: Using x86 segment limits to approximate NX protection
Can anyone give advice?Last edited by MikeTbob; 11-11-2011 at 02:40 AM. Reason: Added Code Tags
- 11-04-2011 #2
It starts up, runs, then at some random point within a few hours just stops. Here are the things I'd check:
Heat, you don't have a broken fan somewhere inside do you? Have you tried running it for a day with the cover off, just to see? You could also check that all the heat sinks are seated correctly on the various chips inside.
Memory failure - this one can produce all kinds of odd errors and it usually gets worse as the machine runs, but its possibly the culprit - get a Linux boot disk, the one you use to install CentOS should do it, boot it up and run the memtest tool, you'll have to leave it on test for a bit.
Power supply. When these go wrong it can look like a cpu or mainboard is failing. I had a PC recently that would just shut off in the middle of everything, I had to go into the BIOS and run the CPU at it's normal speed because the power supply could no longer keep on feeding it the power it needed. Over time they get filled with dust and are prone to heat problems like everything else inside the case. If you have a spare PSU hanging around, you could try it.Linux user #126863 - see http://linuxcounter.net/
- 11-04-2011 #3Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,975
I'd agree with Roxoff that this is probably a hardware problem. One presumes that on restart, the fsck command is run by the system to restore your file systems since it shut down with files open. Overheating is the most common cause of this sort of stuff, though it could be a power supply failure as well. Most motherboards have thermal sensors that can be accessed with the lm_sensors sub-system. Install the lm_sensors package(s) and see if you can monitor the RAM, CPU, and (possibly) motherboard thermal properties. This may give you a hint as to what is happening. I was having similar problems when I had a RAM overheating problem. Fortunately, my system uses fully-buffered ECC RAM so it was able to disable the failing SIMM and keep running, posting a nasty-gram to the UI and console(s). Improving airflow over the RAM sticks solved the problem.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 11-05-2011 #4
I second Rubberman in general, but it would be good to know what VM host it runs under as a guest without problems.
- 11-09-2011 #5Just Joined!
- Join Date
- Oct 2011
- Posts
- 12
It is not the temp. This is right after a crash.
fan2: 2636 RPM (min = 0 RPM, div =
M/B Temp: +40°C (low = +127°C, high = +127°C) sensor = thermistor
CPU Temp: +40°C (low = +127°C, high = +127°C) sensor = diode
Temp3: +20°C (low = +127°C, high = +127°C) sensor = thermistor
as for hardware issues its not very likely as i have used this system on various pc's with completely different hardware and it still happens. I used VMware-server-2.0.0-122956 on windows xp and 7 with no issues.
I also ran memtest anyway on the current hardware and there were no issues. All the fans running in an air conditioned room so heat is also unlikely as shown above.
- 11-09-2011 #6Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,975
Are you running the CentOS in a virtual machine, or is it the host system?
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 11-09-2011 #7
It's clear to me that the problem occurs when running on the physical machine, but not in a VM. It's now clear that the problem is repeatable across multiple hardware platforms. What's not clear is what may be different about this setup than a stock 5.7 setup, which I have running on multiple machines, both 32 and 64 bit versions, with no such issues.
OP, what packages do you have installed that are not from the CentOS repos?
- 11-10-2011 #8Just Joined!
- Join Date
- Oct 2011
- Posts
- 12
- 11-10-2011 #9Just Joined!
- Join Date
- Oct 2011
- Posts
- 12
There is no packages installed that is not part of the distro as far as i know. See attachment for complete list of installed packages.
- 11-10-2011 #10Can you describe the symptoms more clearly?
Originally Posted by djmc401
I mean, you boot the machine up and it's working properly, right?
Then, after some time, it stops responding to network requests. How long does it run properly?
Does it /only/ stop responding to the network? Can you hook up monitor and keyboard and operate it as such? What happens when you do that?
Where does the machine get its idea to restart? Does it do this by itself, do you do that (perhaps logged in locally) or do you hard reboot it using the power switch?Code:Nov 4 11:55:55 localhost syslogd 1.4.1: restart.
Can't tell an OS by it's GUI


Reply With Quote
