Troubleshooting server load/Learn more about a running process
I have two general questions about server load. I'm still relatively new and self taught in this area. The other day one of our servers (running Solaris) ran into some problems and I had a heck of a time troubleshooting what was going on. I eventually discovered cause of the problem, but I ended up getting lucky rather than actually taking logical steps to track down the problem and solve it.
I ran top and saw that the server load was around 3. I believe that tells me how many processes are waiting in the queue. I've run top since that time and see this number stays about the same, so I believe this is normal for this particular server.
I also saw that there was a Perl process running in the list of processes within top. I took a wild stab and killed this process. While the load average did not change, users reported that lag decreased and other services were running efficiently again. This is where I have two questions:
First, I'm not sure I fully understand what the CPU % value next to a process means? I saw that perl process was showing 98-100% and had a time of 20:00. Does the CPU% value reflect how much of the processor is being used by this process? I can't imagine it does, because if you take into account other processes, the totals would add up to over 100%.
Second, while top shows me there is a perl process running, I had trouble finding more detail about what actual perl process was running. What script was running? Was it a simple command run at the command line; was it one of our scripts or had the system been hacked; had one of our users tested a perl script that was caught in a loop? I suppose my question here is, how does one go about tracking down a process like this -- to see it's not only a perl process, but the actual command/script that is running? One of our users later told me he was running a "search and replace" perl script that was changing millions of lines in a file... so I found out what was going on, but don't know if I would have if he hadn't came forward. :)