Hi, All,
I need your help to figure out what may be the problem...
We have a customer application (CA), which customer refused to modify.
This application communicates to system A (SA) which communicates to system B (SB).
CA requests 14 sockets to be open from SA, which subsequently opens 14 sockets on SB.
Everything works fine, except if some sockets are sitting idle for 28.4 days socket dies. This is normal and described in section 4.2.3 of RFC 1323.
The request was made for us to change the "idle timeout" to 284 days, so what I did is implemented a change:
File: /BlueCat-5.1/usr/src/linux/include/net/tcp.h
Line#: 1137
From: #define tcp_time_stamp ((__u32)(jiffies))
To: #define tcp_time_stamp ((__u32)(jiffies/10)

We did not test the setup for 284 days But idle sockets did not die after 28.4 days (even 30+ days).
We thought we were good, but recently we received a report from customers that after 49.7 days sockets become nonresponsive that require system reboot. After system reboot everything is working fine...I guess for another 49.7 days.
I'd like to mention that sockets weren't idle, some of them were under heavy load.
Does anyone have an idea, how the change described above could've cause the problem???
I'm really desperate to understand what is going on.
Your help is really appreciated
Best regards,
Vlad