Find the answer to your Linux question:
Results 1 to 3 of 3
Hi everyone, While working on an embedded system (a security camera) that's locked to an old version of Linux (2.6.29, *sigh*), I've hit a tricky networking problem that I would ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Sep 2012
    Posts
    2

    Unhappy TCP fast retransmission failure problem...?


    Hi everyone,

    While working on an embedded system (a security camera) that's locked to an old version of Linux (2.6.29, *sigh*), I've hit a tricky networking problem that I would really like some help with. Here's what happens:-

    When the camera is sending a JPEG over HTTP, if one of its TCP packets gets lost, the TCP code tries to retransmit it using its fast retransmission mechanism. Though this works OK about 80%-90% of the time, about 10%-20% of the time the packet is unable to be resent and so the whole frame gets lost.

    Thanks to a few helpful printk()s, I now know that the place this is failing in is right at the start of tcp_retransmit_skb() in net/ipv4/tcp_output.c, at the point where it says:

    Code:
    /* Do not sent more than we queued. 1/4 is reserved for possible
     * copying overhead: fragmentation, tunneling, mangling etc.
     */
    if (atomic_read(&sk->sk_wmem_alloc) >
        min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
        return -EAGAIN;
    Hence once the socket gets itself into a state where this if-condition triggers, tcb_retransmit_skb() always returns -EAGAIN, and so the packet never gets retransmitted, even though it is still in memory and otherwise able to be sent (i.e. the external network isn't congested, there's no low memory panic going on, etc).

    This test is still in the current (3.5) Linux kernel exactly as listed above, so it's still basically live code (as far as I can tell).

    All the documentation says is that:
    sk_wmem_alloc == "transmit queue, bytes committed"
    sk_wmem_queued == "persistent queue size"
    sk_sndbuf == "size of send buffer in bytes"

    So, it would seem that the routine is bailing out early with -EAGAIN if the number of bytes [able to be?] committed to the transmit queue is greater than either (the persistent queue size * 1.25, i.e. the approximate size of the current persistent queue when packetized) or (the current send buffer size).

    Unfortunately, this logic makes absolutely no sense to me, even though I've been raking over it for several days.

    Can anybody please shed any light on this particular line? Or if not, can you please point me towards someone who can? All comments and suggestions very welcome!

    Thanks, ....Nick Pelling....

  2. #2
    Just Joined!
    Join Date
    Sep 2012
    Posts
    2
    Problem now solved (sort of) - it turned out to be a memory leak in the out-of-tree Ethernet driver that was becoming visible in the TCP layer (via sk_wmem_alloc). Thanks to Eric D for his help in diagnosing this!

  3. #3
    Just Joined!
    Join Date
    Nov 2012
    Posts
    1
    Hi Nick,

    I am running into the same issue, in a 2.6.31 kernel also on an embedded platform. Can you share what the resolution was in your case?

    Thanks,
    jonathan1

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •