Welcome to Linux Forums! With a comprehensive Linux Forum, information on various types of Linux software and many Linux Reviews articles, we have all the knowledge you need a click away, or accessible via our knowledgeable members.
Find the answer to your Linux question:
New to Linux Forums? Register here for free!
    Linux Forums > GNU Linux Zone > Linux Networking > occasional socket select() hang when using zero timeout

Forgot Password?
 Linux Networking   Hardware/Software related, Modems, Internet connection sharing, IPTables etc.

Site Navigation
Linux Articles
Linux Forums
Linux Downloads
Linux Hosting
Free Magazines
Job Board
IRC Chat
RSS Feeds


Linux Forum Topics
Linux Forums
Your Distro
Linux Resources
GNU Linux Zone
The Community
Reply
 
Thread Tools Display Modes
Old 3 Weeks Ago   #1 (permalink)
Just Joined!
 
Join Date: May 2006
Posts: 1
Question occasional socket select() hang when using zero timeout

We've seen a recent post about a socket select() hang for 10 minutes using a non-zero timeout. However, we are experiencing an occasional socket select() hang for approximately 71 minutes when using a zero timeout on a non-blocking UDP socket.

We're using multi-cpu, multi-core servers from Aberdeen - which are basically repackaged supermicro servers. We're running CentOS v5.2

uname -a

2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47 EDT 2008 i686 i686 i386 GNU/Linux

rpm -qa kernel\* | sort

kernel-2.6.18-92.el5
kernel-headers-2.6.18-92.el5

Occasionally, when a non-blocking UDP socket is polled using the select() system call with a zeroed timeval structure, we note that the select() stalls for approximately 71 minutes. We wish to respond quickly when packets appear spontaneously on this socket, but the opposite socket very, very rarely spontaneously transmits a packet. It is common for no packet to be spontaneously transmitted to this socket for many hours.

We find it quite coincidental that 0xFFFFFFFF in usec resolution equals 71 minutes, 35 seconds. We hypothesized that the usec component of the zeroed timeval structure provided to select() is occasionally being decremented to 0xFFFFFFFF (or the equivalent in "jiffies") prior to the OS testing if it is equal to zero. Thus, we incur a 71 minute, 35 second timeout. An examination of the kernel source doesn't appear to support our hypothesis, but it is nonetheless quite a coincidence.

We call select() on this socket at quite a high rate (e.g. 500 Hz) and this problem might occur once or twice over 12 hours. It is apparently quite sensitive to precisely when the select() function is called in relation to other kernel activities.

We have searched the RedHat bug list, the centos forum, the linuxquestions forum, and this site and have not found any similar complaints using select() with a zeroed timeout. Has anyone else observed this behavior? Is there a remedy that entails something other than avoiding zero timeouts or a watchdog on threads that might perform zero timeout select() calls? Our product also employs a library that may perform zero timeout select() calls, so we'd prefer an OS level solution. We didn't notice anything in the centos v5.3 release notes to indicate that such a problem has been recognized and addressed. Despite that, we intend to update one of our test resources to v5.3 and give that a try. It looks as if the select() implementation has undergone quite a re-write since the 2.6.18 version.

We don't have good feel for whether this problem is due to a unique interaction of v5.2 centos and our Aberdeen peculiar server hardware. If it isn't peculiar to our hardware, we'd have thought there would already be plenty of posts about this issue on-line. On the other hand, despite the vast number of Linux installations, we suppose it's possible a problem such as this might go unnoticed for an extended period of time. It manifests very infrequently given the number of opportunities. And one might only recognize it happens if the socket he is polling using select() with a zeroed timeout only very, very rarely receives packet traffic. Otherwise, the select() would return due to the reception of that traffic.

Thanks
vulcanusa is offline  


Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Free Magazines
Run Your Own Web Server Using Linux & Apache - Free 191 Page Preview
Learn about everything you'll need to build and maintain your Linux servers, and to deploy Web applications to them.
subscribe
Open Source Security Myths Dispelled
Dispel the five major myths surrounding Open Source Security and gain the tools necessary to make a truly informed decision for your IT organization
subscribe
InformationWeek
InformationWeek is the only newsweekly you'll need to stay on top of the latest developments in information technology.
subscribe



All times are GMT. The time now is 09:56 AM.






© 2000 - 2009 - All Rights Reserved - Property of  MAS Media

Content Relevant URLs by vBSEO 3.3.0 RC2