Apache Creates Runaway Processes
Our server has started behaving rather strangely, and I was hoping you guys might be able to help me out.
First, a little information on our setup: We have:
2 web servers running RHEL 5.2/Apache 2.2/PHP 5.2.
1 DB server running RHEL 5.2/MySQL 5.x
1 File server running RHEL 5.2/Apache 2.2
All behind a load-balancer and firewall.
The server environment changed dramatically two weeks prior to this problem manifesting itself. Previously, there had been a single web server running a terrible custom build of CentOS (not sure the version number) and Apache 1.3. httpd.conf was re-written for the new site, and everything ran smoothly for the first two weeks in the new environment. The problems almost coincide with the decomissioning of the old servers, though we cannot find a place where they are referred to by a direct (non-DNS managed) name.
Several weeks ago, we started getting apparent server crashes. Basically, Apache starts spawning more and more new processes until the setting for max processes and server memory are filled and all subsequent traffic is rejected. The only solution we have found is to restart apache and wait for the process to start all over again. sometimes this happens immediately, other times it can go for as long as four or five days between incident.
Increasing the value for max processes just forestalls the inevitable by a slight factor.
If apache is not reset, the server eventually seems to correct itself, usually after several hours.
The behavior typically confines itself to one of the two web servers, but has presented itself on both. Basically, Server 1 will be bad for a couple of weeks, then server 2.
The combination of this evidence suggests (to me, at least) a problem in some area of the site that is repeatedly visited by a few users (probably admins). The server containment suggests that one or few users are IP-bound to one or the other server by the load-balancer; not sure how else to explain the consistency on one or the other machine.
So here are the two questions that come out of all of this:
1) Can you recommend an approach for identifying what pages/activity is causing the additional processes to spawn? I've got the PIDs on all the running httpd's, but don't know how to tie those back to something more meaningful (ideally the access log).
2) Do you know what might cause these symptoms?
We are digging through the code to see if we can pin-point where it might be, but it is a bit of a needle in a haystack process right now, and any help would be very appreciated.