Results 1 to 7 of 7
Hi,
We are running CentOS 5.4 and have noticed an issue with the HAL daemon which spawns processes which cause a load to progressively increase until we kill them and ...
- 11-10-2010 #1Just Joined!
- Join Date
- Apr 2009
- Posts
- 15
Hald stuck processes
Hi,
We are running CentOS 5.4 and have noticed an issue with the HAL daemon which spawns processes which cause a load to progressively increase until we kill them and restart the service.
We usually see about 9 processes on a server that has it running correctly and over 20+ processes on a server that has this issue. The load is usually around 2-3 but gets progressively worse as time goes on.
There is nothing in the messages on hal so any help with where to look or how to debug this issue would be appreciated.
Thanks
- 11-10-2010 #2Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
What processes are incorrectly running?
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 11-10-2010 #3Just Joined!
- Join Date
- Apr 2009
- Posts
- 15
We have localized the issue with the hal processes. If we kill them all and restart the haldaemon service everything works and the load drops.
- 11-10-2010 #4Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
It sounds like some of the hardware that hald is monitoring are generating events that HAL is seeing as something it needs to deal with, such as a CD being inserted into a reader, or a USB drive being connected. So, this may be a sign of malfunctioning hardware.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 11-10-2010 #5Just Joined!
- Join Date
- Apr 2009
- Posts
- 15
I would buy that as the issue if it was happening on one server but we have seen this on 6 servers. 5 of the servers that we killed the processes on haven't see the issue since (waiting to debug this issue on the 6th). We also monitor the servers using Dell OpenManage and there are no alerts being generated from that.
- 11-10-2010 #6Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
Time to apply RCA (Root Cause Analysis) techniques to this problem. Possibility: systems with similar/same hardware configurates experience same problem?
Query: systems with this problem, do they stabilize over time, or continue to generate excess hal processes?
Possiblity: if stabilize over time, some hardware may take time to stabilize. If systems are basically the same hardware configuration, this may be a system design and/or build flaw. This is where you check to see if they may have been built as part of a batch, or with components that came from same manufacturing batches.
In any case, these sort of problems can be very hard to nail down to a singular and unambiguous root cause. Unfortunately, failure to do so can result in a lot of wasted engineering time keeping the systems stabilized.
BTW, have you considered updating to CentOS 5.5 and current kernels? Also, are you running the Xen-enabled kernel, or vanilla? Is your kernel vanilla, or have you customized it (configured additional services)?Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 11-11-2010 #7Just Joined!
- Join Date
- Apr 2009
- Posts
- 15
Most times, upgrading doesn't work unless you can identify what the issue is and see if a newer version of something will fix it. Thanks for your help but I'll hold on and see if anyone has any information on how to debug this issue further.


Reply With Quote
