Hello everyone, I have a question about why I'm getting mixed results when receiving packets over my network.
I have built a system setup that uses TCP sockets to send some computational values back and forth as a way of parallelizing one of my procedures. I will try not to bore everyone with details on that but essentially, my client machine will open sockets to each of my render nodes and sends some trivial data over. Then the render nodes each send a series of floating point (double) values back to the client to complete the computation. All was working well for a while but I recently switched my node environment over to Ubuntu for compatibility with some external API's. Now my client machine receives garbage float values most of the time from my sockets.
The unusual part of this problem is when I started debugging my application, I found that I could get the correct values when I inserted print statements to show the contents of the floating point values after each one was received. It seems as though slowing down the act of calling the next "recv" from the network by writing to standard out greatly increases the chances of my the value being correct. Has anyone ever run into a situation like this before? I started to contemplate raising my TCP buffer sizes but I have doubts that will make any difference as I am receiving the correct number of packets, just with bad values.
Any help would be greatly appreciated.
Without looking at your code I can only guess that you have a race condition. The select() function is probably telling you that there is data to read, but you aren't checking the return from recv() to verify that you got the correct amount of data. As I said, without looking at your code, the best I can do is to guess based upon my experience debugging similar problems in the past.
To me it also looks like a buffer that is written to (rcv) while it is only half flushed to the socket stream.
Hey guys, I appreciate the input from all of you. I know the size of each of the packets I'm receiving is 8 bytes, which is the standard size of a double. Now the question becomes, since I know placing print statements in my output tends to make the application work, did printing the size of the packets actually lead to correct values being displayed. That I don't think anyone will know.
My understanding with TCP is that there is no way to manually flush the buffer since it likes to control many aspects of the packet distribution. I did disable Nagol's algorithm in one case on my server side applications to attempt to get the application to send packets immediately. Unfortunately, that didn't seem to have much of an effect on anything. Are you guys aware of any mechanism's that can be used to flush a TCP send buffer?
Without a small peek on the interesting lines of code there is no way to have any true clues. There are simply too many things one can do wrong.