mod_jk and failover issues
I setup a glassfish cluster successfully on RHEL5 version 3.1 b43 with two nodes each node having one instance. I tested mod_jk versions 1.2.26/28.31 both prebuild and manually compiled. We plan to deploy an application that will be receiving each hour small updates from 300000 clients. We deployed and sample http application showing the instance name to make sure the loadbalancer works. I followed this post, but made changes to the worker.properties as failover was not working:
h t t p : //tiainen.sertik.net/2011/03/load-balancing-with-glassfish-31-and.html]tiainen: Load balancing with Glassfish 3.1 and Apache
LoadModule jk_module modules/mod_jk.so
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories
JkRequestLogFormat "%w %V %T"
# redirect traffic to loadbalancer
JkMount /* loadbalancer
# default properties for workers
# properties for worker1
# properties for worker2
# properties for loadbalancer
1. The DAS and the Glassfish instances work as expected
2. The loadbalancing works just fine
3. Failover works ONLY if I stop or restart the instance from the DAS
If I restart the OS of an instance, failover is damaged - instance is detected as down and until this instance is down things seem fine. When the failed instance boots up either of them is not working (randomly). Sometimes I have to restart the cluster and the httpd to get things going. Somehow mod_jk makes difference between both types of failover. It wrongly detects one or both intances as down.
[Tue Apr 26 09:07:18 2011] [29639:3085998688] [error] ajp_connection_tcp_get_message::jk_ajp_common.c (1011): (worker2) can't receive the response message from tomcat, network problems or tomcat (192.168.3.204:28009) is down (errno=104)
[Tue Apr 26 09:07:18 2011] [29639:3085998688] [error] ajp_get_reply::jk_ajp_common.c (1766): (worker2) Tomcat is down or refused connection. No response has been sent to the client (yet)
[Tue Apr 26 09:07:18 2011] [29639:3085998688] [info] ajp_service::jk_ajp_common.c (2186) (worker2) sending request to tomcat failed (recoverable), (attempt=1)
Can you, please suggest where to look for the problem?