Hello

I have a client/server app (Java) that I'm migrating from Solaris to RH Linux. since I started running it in RH, I noticed some issues related to latency. I managed to isolate the problem that looks like this:

* client sends 5 messages (32 bytes each) in a row (same application timestamp) to the server.
* server echos messages.
* client receives replies and prints round trip time for each msg.

in Solaris, all is well: I get ALL 5 replies at the same time, roughly 80ms after having sent original messages (client & server are several thousands miles away from each other: my ping RTT is 80ms, all normal).

in RH, first 3 messages are echoed normally (they arrive 80ms after they've been sent), however the following 2 arrive 80ms later (so total 160ms RTT).

the pattern is always the same. clearly looked like a TCP problem.

on my solaris box, I had previously configured the tcp stack with 2 specific options:

1. disable nagle algorithm globally
2. set tcp_deferred_acks_max to 0

on RH, it's not possible to disable nagle globally, but I disabled it on all of my apps' sockets (TCP_NODELAY).

so I started playing with tcpdump (on the server machine), and compared both outputs:

SOLARIS:

22 2.085645 client server TCP 56150 > 6006 [PSH, ACK] Seq=111 Ack=106 Win=66672 Len=22 "MSG_1 RCV"
23 2.085680 server client TCP 6006 > 56150 [ACK] Seq=106 Ack=133 Win=50400 Len=0
24 2.085908 client server TCP 56150 > 6006 [PSH, ACK] Seq=133 Ack=106 Win=66672 Len=22 "MSG_2 RCV"
25 2.085925 server client TCP 6006 > 56150 [ACK] Seq=106 Ack=155 Win=50400 Len=0
26 2.086175 client server TCP 56150 > 6006 [PSH, ACK] Seq=155 Ack=106 Win=66672 Len=22 "MSG_3 RCV"
27 2.086192 server client TCP 6006 > 56150 [ACK] Seq=106 Ack=177 Win=50400 Len=0
28 2.086243 server client TCP 6006 > 56150 [PSH, ACK] Seq=106 Ack=177 Win=50400 Len=21 "MSG_1 ECHO"
29 2.086440 client server TCP 56150 > 6006 [PSH, ACK] Seq=177 Ack=106 Win=66672 Len=22 "MSG_4 RCV"
30 2.086454 server client TCP 6006 > 56150 [ACK] Seq=127 Ack=199 Win=50400 Len=0
31 2.086659 server client TCP 6006 > 56150 [PSH, ACK] Seq=127 Ack=199 Win=50400 Len=21 "MSG_2 ECHO"
32 2.086708 client server TCP 56150 > 6006 [PSH, ACK] Seq=199 Ack=106 Win=66672 Len=22 "MSG_5 RCV"
33 2.086721 server client TCP 6006 > 56150 [ACK] Seq=148 Ack=221 Win=50400 Len=0
34 2.086947 server client TCP 6006 > 56150 [PSH, ACK] Seq=148 Ack=221 Win=50400 Len=21 "MSG_3 ECHO"
35 2.087196 server client TCP 6006 > 56150 [PSH, ACK] Seq=169 Ack=221 Win=50400 Len=21 "MSG_4 ECHO"
36 2.087500 server client TCP 6006 > 56150 [PSH, ACK] Seq=190 Ack=221 Win=50400 Len=21 "MSG_5 ECHO"
37 2.165390 client server TCP 56150 > 6006 [ACK] Seq=221 Ack=148 Win=66632 Len=0
38 2.166314 client server TCP 56150 > 6006 [ACK] Seq=221 Ack=190 Win=66588 Len=0
39 2.364135 client server TCP 56150 > 6006 [ACK] Seq=221 Ack=211 Win=66568 Len=0

REDHAT:

17 2.081163 client server TCP 55879 > 6006 [PSH, ACK] Seq=111 Ack=106 Win=66672 Len=22 "MSG_1 RCV"
18 2.081178 server client TCP 6006 > 55879 [ACK] Seq=106 Ack=133 Win=5888 Len=0
19 2.081297 server client TCP 6006 > 55879 [PSH, ACK] Seq=106 Ack=133 Win=5888 Len=21 "MSG_1 ECHO"
20 2.081711 client server TCP 55879 > 6006 [PSH, ACK] Seq=133 Ack=106 Win=66672 Len=22 "MSG_2 RCV"
21 2.081761 client server TCP 55879 > 6006 [PSH, ACK] Seq=155 Ack=106 Win=66672 Len=22 "MSG_3 RCV"
22 2.081846 server client TCP 6006 > 55879 [PSH, ACK] Seq=127 Ack=177 Win=5888 Len=21 "MSG_2 ECHO"
23 2.081995 server client TCP 6006 > 55879 [PSH, ACK] Seq=148 Ack=177 Win=5888 Len=21 "MSG_3 ECHO"
24 2.082011 client server TCP 55879 > 6006 [PSH, ACK] Seq=177 Ack=106 Win=66672 Len=22 "MSG_4 RCV"
25 2.082362 client server TCP 55879 > 6006 [PSH, ACK] Seq=199 Ack=106 Win=66672 Len=22 "MSG_5 RCV"
26 2.082377 server client TCP 6006 > 55879 [ACK] Seq=169 Ack=221 Win=5888 Len=0
27 2.171003 client server TCP 55879 > 6006 [ACK] Seq=221 Ack=148 Win=66632 Len=0
28 2.171019 server client TCP 6006 > 55879 [PSH, ACK] Seq=169 Ack=221 Win=5888 Len=42 "MSG_4 ECHO + MSG_5 ECHO"
29 2.257498 client server TCP 55879 > 6006 [ACK] Seq=221 Ack=211 Win=66568 Len=0

so, I got confirmation things are not working correctly for RH: packet 28 is sent TOO LATE, it looks like the server is waiting for packet 27's ACK before doing anything.

seems to me it's the most likely reason...

then I realized that the "Win" parameters are different on Solaris & RH dumps: 50400 on Solaris, only 5888 on RH. that's another hint...

I read the doc about the slide window & buffer window, and played around with the rcvBuffer & sendBuffer in java on my sockets, but never managed to change this 5888 value to anything else (I checked each time directly with tcpdump).

does anybody know how to do this ? I'm having a hard time getting definitive information, as in some cases there's "auto-negotiation" that I might need to bypass, etc...

I eventually managed to get only partially rid of my initial problem by setting the "tcp_slow_start_after_idle" parameter to 0 on RH, but it did not change the "win" parameter at all. the same problem was there for the first 4 groups of 5 messages, with TCP retransmission & TCP Dup ACK in tcpdump, then the problem disappeared altogether for all following groups of 5 messages.

It doesn't seem like a very clean and/or generic solution to me. I'd really like to reproduce the exact same conditions under both OSes.

I'll keep researching, but any help from TCP gurus would be greatly appreciated !

thanks !