TCP Lost frames when using eepro100 driver
Linux kernel 2.6.7 and, same behaviour on 2.4.21:
I have two programs on two separate Pentium 3 nodes interconnected on a 100Mbit LAN both have the same setup. All drivers are compiled into the kernel i.e. no kernel modules.
program1 does two write(2)-calls to a connected TCP-socket. The first chunk is 4 bytes the second is about 100 bytes. From time to time (from 1min to 11hrs) the second chunk is lost i node2. I can see this using tcpdump(1) on both machines. At last (120ms later) TCP retransmits and the chunk arrives at node2 two, but this is to late for the application which in this case is an embedded system with soft realtime reqs. The programs both set the socket option TCP_NODELAY, this because of the needed realtime properties.
A line analyzer showed me that both chunks really hits the LAN. So they are not lost in node1 instead there seems to be a problem in the receiving side(node2).
looking in /proc/net/dev (both nodes) - no errors and no collisions are detected
These problems all occurred while using the eepro100 driver
Trying out the Intel driver e100.c this never occurred (been running it for 3 days now)
- Have I solved the problem by switching driver?
- Are there any linux tools that will show me, more exactly, were the packet is lost?
- Is there an official [recommended] driver?
- If this is a bug in the eepro100 driver - who do I submit a bug report to?
- Both nodes use dhcp to get their ip-configuration. Using the eepro100 driver the hostname is set to what i stated in the dhcpd.conf file in the server. When using the e100 driver the specified hostname is not set, instead the hostname is set to the supplied ip-address. How do I remedy this?