TCP fast retransmission failure problem...?
While working on an embedded system (a security camera) that's locked to an old version of Linux (2.6.29, *sigh*), I've hit a tricky networking problem that I would really like some help with. Here's what happens:-
When the camera is sending a JPEG over HTTP, if one of its TCP packets gets lost, the TCP code tries to retransmit it using its fast retransmission mechanism. Though this works OK about 80%-90% of the time, about 10%-20% of the time the packet is unable to be resent and so the whole frame gets lost.
Thanks to a few helpful printk()s, I now know that the place this is failing in is right at the start of tcp_retransmit_skb() in net/ipv4/tcp_output.c, at the point where it says:
Hence once the socket gets itself into a state where this if-condition triggers, tcb_retransmit_skb() always returns -EAGAIN, and so the packet never gets retransmitted, even though it is still in memory and otherwise able to be sent (i.e. the external network isn't congested, there's no low memory panic going on, etc).
/* Do not sent more than we queued. 1/4 is reserved for possible
* copying overhead: fragmentation, tunneling, mangling etc.
if (atomic_read(&sk->sk_wmem_alloc) >
min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
This test is still in the current (3.5) Linux kernel exactly as listed above, so it's still basically live code (as far as I can tell).
All the documentation says is that:
sk_wmem_alloc == "transmit queue, bytes committed"
sk_wmem_queued == "persistent queue size"
sk_sndbuf == "size of send buffer in bytes"
So, it would seem that the routine is bailing out early with -EAGAIN if the number of bytes [able to be?] committed to the transmit queue is greater than either (the persistent queue size * 1.25, i.e. the approximate size of the current persistent queue when packetized) or (the current send buffer size).
Unfortunately, this logic makes absolutely no sense to me, even though I've been raking over it for several days. :-(
Can anybody please shed any light on this particular line? Or if not, can you please point me towards someone who can? All comments and suggestions very welcome!
Thanks, ....Nick Pelling....