Connections stuck in FIN_WAIT2 for days
Not sure if this post should be here or the kernel forum. Let me know.
I'm developing a simple TCP server. 99% of incoming connections terminate correctly and the sockets disappear from netstat output. However, a few connections hang around for indefinitely in the FIN_WAIT2 state. Now, I know that the clients in this case are misbehaving by not sending a FIN,ACK to close the connection. However, regardless of client behaviour, the connections should only remain in this state for a maximum of 60 seconds (set globally by /proc/sys/net/ipv4/tcp_fin_timeout)
You will notice that the timer output of netstat is showing that these connections are not being timed. To me that suggests that these connections will hang around forever, contrary to what tcp(7) says (repeated below)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
tcp 0 0 10.0.0.12:2000 10.0.0.6:50990 FIN_WAIT2 9507/perl off (0.00/0/0)
tcp 0 0 10.0.0.12:2000 10.0.0.6:57896 FIN_WAIT2 7247/perl off (0.00/0/0)
tcp 0 0 10.0.0.12:2000 10.0.0.6:60683 FIN_WAIT2 6835/perl off (0.00/0/0)
I have watched the packets flowing back and forth, and the server is correctly performing a half close of the connection once it has finished sending data. The client is misbehaving by not sending its FIN,ACK, but so is the server by not closing the connection anyway. The end result is that I have hundreds of connections and processes hanging around forever.
tcp_fin_timeout (integer; default: 60)
This specifies how many seconds to wait for a final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specifica-
tion, but required to prevent denial-of-service attacks. In Linux 2.2, the default value was 180.
I'm running RHEL 5.3 with 2.6.18-128.1.6.el5.PAE (i386) as my kernel.
Is there anything I can do to find out why these connections are not being forcibly closed by the kernel? Why are these connections NOT being timed?