Find the answer to your Linux question:
Results 1 to 2 of 2
Hi, All, I posted this message in a different forum with no response... Maybe you can help me... I need your help to figure out what may be the problem... ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jul 2009
    Posts
    2

    System loses all connactions after 49.7 days


    Hi, All,
    I posted this message in a different forum with no response...
    Maybe you can help me...

    I need your help to figure out what may be the problem...
    We have a customer application (CA), which customer refused to modify.
    This application communicates to system A (SA) which communicates to system B (SB).
    CA requests 14 sockets to be open from SA, which subsequently opens 14 sockets on SB.
    Everything works fine, except if some sockets are sitting idle for 24.8 days socket dies. This is normal and described in section 4.2.3 of RFC 1323.
    The request was made for us to change the "idle timeout" to 248 days, so what I did is implemented a change:
    File: /BlueCat-5.1/usr/src/linux/include/net/tcp.h
    Line#: 1137
    From: #define tcp_time_stamp ((__u32)(jiffies))
    To: #define tcp_time_stamp ((__u32)(jiffies/10)

    We did not test the setup for 248 days But idle sockets did not die after 24.8 days (even 30+ days).
    We thought we were good, but recently we received a report from customers that after 49.7 days sockets become nonresponsive that require system reboot. After system reboot everything is working fine...I guess for another 49.7 days.
    I'd like to mention that sockets weren't idle, some of them were under heavy load.
    Does anyone have an idea, how the change described above could've cause the problem???
    I'm really desperate to understand what is going on.
    Your help is really appreciated
    Best regards,
    Vlad

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,512
    A good example of the law of unintended consequences. You changed a header value and recompiled the affected kernel components, but you didn't check the kernel tcp/ip code itself to see what else needed to be modified. Personally, I wouldn't take your approach since there are likely better (simpler and less intrusive) means to accomplish your goals, such as having an idle timeout handler for your systems that pings the remote system on the socket after some period of inactivity, thus keeping the socket alive. ONLY munge with the kernel code if a) you have no recourse, and b) you REALLY know what you are doing and what the side-effects will be.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •