Results 1 to 2 of 2
Hi, All, I posted this message in a different forum with no response... Maybe you can help me... I need your help to figure out what may be the problem... ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 07-29-2009 #1
- Join Date
- Jul 2009
System loses all connactions after 49.7 days
I posted this message in a different forum with no response...
Maybe you can help me...
I need your help to figure out what may be the problem...
We have a customer application (CA), which customer refused to modify.
This application communicates to system A (SA) which communicates to system B (SB).
CA requests 14 sockets to be open from SA, which subsequently opens 14 sockets on SB.
Everything works fine, except if some sockets are sitting idle for 24.8 days socket dies. This is normal and described in section 4.2.3 of RFC 1323.
The request was made for us to change the "idle timeout" to 248 days, so what I did is implemented a change:
From: #define tcp_time_stamp ((__u32)(jiffies))
To: #define tcp_time_stamp ((__u32)(jiffies/10)
We did not test the setup for 248 days But idle sockets did not die after 24.8 days (even 30+ days).
We thought we were good, but recently we received a report from customers that after 49.7 days sockets become nonresponsive that require system reboot. After system reboot everything is working fine...I guess for another 49.7 days.
I'd like to mention that sockets weren't idle, some of them were under heavy load.
Does anyone have an idea, how the change described above could've cause the problem???
I'm really desperate to understand what is going on.
Your help is really appreciated
- 08-07-2009 #2
- Join Date
- Apr 2009
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
A good example of the law of unintended consequences. You changed a header value and recompiled the affected kernel components, but you didn't check the kernel tcp/ip code itself to see what else needed to be modified. Personally, I wouldn't take your approach since there are likely better (simpler and less intrusive) means to accomplish your goals, such as having an idle timeout handler for your systems that pings the remote system on the socket after some period of inactivity, thus keeping the socket alive. ONLY munge with the kernel code if a) you have no recourse, and b) you REALLY know what you are doing and what the side-effects will be.Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!