Results 1 to 8 of 8
Hi,
My server is running Red Hat 4.1.2-44.
I am having an issue with it hanging (website doesn't respond, can't connect via SSH (it's a remote server)) after 4am on ...
- 03-29-2010 #1Just Joined!
- Join Date
- Mar 2010
- Posts
- 4
How to increase deatil of messages logfile to help resolve a intermittent crash?
Hi,
My server is running Red Hat 4.1.2-44.
I am having an issue with it hanging (website doesn't respond, can't connect via SSH (it's a remote server)) after 4am on every 3rd Sunday or so.
A reboot is required to get the server responding again.
I have checked every logfile I can find and found nothing that helps me. The thing I have noticed is that some of these log files start afresh after the reboot so my initial thoughts are that it is a logrotate issue or something similar.
The last activity I can find before the reboot (at 8:32) is in the logfile /var/log/cron:
Mar 28 04:22:01 crond[9356]: (root) CMD (run-parts /etc/cron.weekly)
Mar 29 08:32:39 crond[3884]: (CRON) STARTUP (V5.0)
My extremely verbose MySQL log stops logging at 4:19
The only other clue is a hangcheck in the messages log:
Mar 28 04:08:54 kernel: [499320.949513] Hangcheck: hangcheck value past margin!
---------------- LOG ROTATE -----------------------------------
Mar 28 04:13:58 syslogd 1.4.1: restart.
--------------- Me rebooting --------------------------------
Mar 29 08:31:56 syslogd 1.4.1: restart.
Does this scenario ring any bells for anyone else and if so how did you sort it out?
Otherwise does anyone know how I can increase the level of logging to the messages file to try to generate a few more clues?
Anything anyone can suggest that would give me more clues would be great, thanks.
Cmb
- 03-29-2010 #2Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
Have you tried using a monitoring tool like Nagios? If someone is on-site with the server, can they access the console? Logging messages per-se may not work for you, because there may be nothing to log. For example, a process may be in a "run-away" state in that it is in a tight CPU loop that sucks up all the processing power of the system. Also, is this a single core, single CPU system, or a multi-core and/or multi-processor system? If the former, then a server process that has got into a run-away state can cause the server to "hang", unable to process any external events. If you have a person on-site who can login as root to a console/text command line before the system goes snafu and sets the priority of the shell to a higher level than the server processes (un-nice the shell pid enough levels to have precidence over the server(s)), then they should be able to deal with a run-away process by killing and restarting it. Then, you would know at least where to look for problems.
Another approach is to use a watchdog timer. If the system locks up, either in the kernel or a run-away process, then it can reboot the system automatically. This is, to my thinking, a last-resort scenario, but is sometimes a reasonable approach if the system has to run "lights-out" 24x7.Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 03-29-2010 #3Just Joined!
- Join Date
- Mar 2010
- Posts
- 4
Thanks Rubberman,
It is indeed a single core server so it sounds like you could be right.
I will try to get someone on site to have a look when it goes down before I reboot it.
Failing that I'll look for some application taht can log memory/processor useage.
Thanks again for replying.
- 03-29-2010 #4
4am sure smells like cron.daily. I don't see anything in there on my RHEL 4 servers that would be happening only on the 3rd Sunday, though.
logrotate is definitely a likely suspect. It does some fairly ham-handed stuff like HUPping processes based on the contents of their PID files. You can get it to cough up some extra trace information with the "-v" switch. You could modify /etc/cron.daily/logrotate to say:
/usr/sbin/logrotate -v /etc/logrotate.conf &>>/tmp/logrotate.trace
And see if it completed when you get your next system hang. Note that the /tmp file will keep growing until you turn this off. Next time you get a hang, you can compare the output for the last iteration of logrotate from the one prior and look for whether it's hanging or otherwise going awry.
cron.daily also runs the slocate.cron script, which renices its parent PID for reasons I've never understood. Only if DAILY_UPDATE is turned on in /etc/updatedb.conf.
- 03-29-2010 #5
- 03-29-2010 #6
- 03-30-2010 #7Just Joined!
- Join Date
- Mar 2010
- Posts
- 4
I'll try that, thanks both.
I too am no spring chicken and have hair that is whitening (and retreating).
I'm not sure if I'm a weenie though as I'm so old I don't know what one is. My interpretation of that particular word is that of a smoked German sausage made from mechanically recovered meat.
Thanks again
- 03-30-2010 #8
See definition 3:
Weenie - Definition and More from the Free Merriam-Webster Dictionary
Some of my co-workers would say definition 2 also applies.
As I age and have various procedures done to shore up the old bod, your "mechanically recovered meat" characterization becomes more and more true, although I probably have more Scots and American Indian blood than German.


Reply With Quote
