Results 1 to 6 of 6
Hi,
I'm developing a simple TCP server. 99% of incoming connections terminate correctly and the sockets disappear from netstat output. However, a few connections hang around for indefinitely in the ...
- 07-20-2009 #1Just Joined!
- Join Date
- Jul 2009
- Posts
- 4
TCP connections stuck in FIN_WAIT2 state
Hi,
I'm developing a simple TCP server. 99% of incoming connections terminate correctly and the sockets disappear from netstat output. However, a few connections hang around for indefinitely in the FIN_WAIT2 state. Now, I know that the clients in this case are misbehaving by not sending a FIN,ACK to close the connection. However, regardless of client behaviour, the connections should only remain in this state for a maximum of 60 seconds (set globally by /proc/sys/net/ipv4/tcp_fin_timeout)
You will notice that the timer output of netstat is showing that these connections are not being timed. To me that suggests that these connections will hang around forever, contrary to what tcp(7) says (repeated below)Code:Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer tcp 0 0 10.0.0.12:2000 10.0.0.6:50990 FIN_WAIT2 9507/perl off (0.00/0/0) tcp 0 0 10.0.0.12:2000 10.0.0.6:57896 FIN_WAIT2 7247/perl off (0.00/0/0) tcp 0 0 10.0.0.12:2000 10.0.0.6:60683 FIN_WAIT2 6835/perl off (0.00/0/0)
I have watched the packets flowing back and forth, and the server is correctly performing a half close of the connection once it has finished sending data. The client is misbehaving by not sending its FIN,ACK, but so is the server by not closing the connection anyway. The end result is that I have hundreds of connections and processes hanging around forever.Code:tcp_fin_timeout (integer; default: 60) This specifies how many seconds to wait for a final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specifica- tion, but required to prevent denial-of-service attacks. In Linux 2.2, the default value was 180.
I'm running RHEL 5.3 with 2.6.18-128.1.6.el5.PAE (i386) as my kernel.
Is there anything I can do to find out why these connections are not being forcibly closed by the kernel? Why are these connections NOT being timed?
Cheers,
Georgio
- 07-21-2009 #2Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,499
If you are running RHEL 5, then I assume you have a support subscription with Red Hat? If so, shouldn't you be addressing this question to them? Personally, I haven't seen this problem on my CentOS 5.3 system (RHEL 5.3 clone), but that means very little I'm afraid since I don't have access to all of your system configuration information.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 07-24-2009 #3Just Joined!
- Join Date
- Jul 2009
- Posts
- 4
Hi Rubberman,
Ooops, I don't know why I wrote RHEL5.3, I'm actually running CentOS 5.3. I'm about to upgrade to the latest kernel (version 2.6.18-128.2.1). I'll keep you posted on how things go with it. I have a bad feeling that the problem won't be fixed.
Cheers,
Georgio
- 07-24-2009 #4Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,499
I'm running the latest kernel now. I haven't experienced any problems like this, but that doesn't necessarily mean a lot. I have to think that the problem is likely in your TCP/IP configuration, or more likely a problem in the NIC and/or driver. I've had similar issues in the past with HP-UX running on PA-RISC systems where a fault in the onboard NIC would cause this type of problem. After a lot of investigation, HP engineering confirmed that the problem was a bug in the NIC firmware and how it interacted with the TCP/IP stack. The only solution was to use a separate ethernet board in the system instead of the one built into the standard I/O board. So, if you can use another NIC from a different manufacturer, that would confirm whether or not it is a problem related to your specific network adapter and/or drivers for it.
FWIW, I am running an Intel S5000XVN workstation/server motherboard with dual onboard gigabit NIC's (Intel chip set) and have had absolutely zero problems with them.Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 07-27-2009 #5Just Joined!
- Join Date
- Jul 2009
- Posts
- 4
Hi,
I think the problem was due to my own stupidity, not sure. I have changed my server code so that once I perform a half-close, I put a timeout on my call to select(). I think the reason why the kernel wasn't reaping the connections is because half closed sockets are still active. If I use the timeout, my code finishes gracefully and the kernel then reaps the connections after a couple of minutes. Thanks for your helpful suggestions! If you think what I'm doing sounds incorrect please let me know.
Thanks!
Georgio
- 08-03-2009 #6Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,499
I've been on vacation this past week. I'll think about what you are doing (a code sample would be useful) with regard to appropriateness to dealing with closing type events. In any case, I'm glad you found a workable solution, though to me, having half-open connections for a couple of minutes on the close is still way too much except in unusual circumstances. At least that's my opinion.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!


Reply With Quote

