Hi all,

We are developing a network application which was hanging after it ran for a few days. We couldnt find the exact reason for its hang. But when we tried to run it under the dbx we saw one of the threads hung in the select call.

Here is the stack trace...

#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00690a41 in ___newselect_nocancel () from /lib/tls/libc.so.6
#2 0xf7c8927f in MySocket::checkSocketRead () from /opt/SharedComponents/lib/libmysocket.so

And the code corresponding to this call is

const int timeout = 1000;
m_sock->checkSocketRead( timeout );

And in CheckSocketRead call…

tm.tv_sec = timeout / 1000;
tm.tv_usec = (timeout % 1000) * 1000;
tptr = &tm;

// Directly call the BSD socket select
int ret = call_select((int)m_socket + 1, &read_set, NULL, &except_set, tptr);

We are here putting a timeout of 1 second. But still the select call seems to hang.

I ran the thread under the strace and saw that
//Select call takes almost 1 second as expected
select(27, [26], NULL, [26], {412316860416000, 67108864}) = 0 (Timeout) <0.096305>
//Some Timer function
time(NULL) = 1250833595 <0.000007>
//Now some read function is called… we don't call this… So it should be internal to the select command… Why are we checking this?
read(26, 0xf53688ad, 1117) = -1 EAGAIN (Resource temporarily unavailable) <0.000007>

Can some one explain what is happening here? Why is the select hanging even after we set the timeout as 1 second?