Hi,

I have the following question, maybe you know the answer.
I have an application which manages network modules and a lot of connections. All the sockets are non blocking, so in order to know when to receive (and not have a busy wait), I added an epoll mechanism to my app.

All the sockets in the system are added (epoll_ctl()) to a central epoll socket, and whenever there's something to read (EPOLLIN) I search the ready socket in a lookup table and notify the appropriate network module that this socket is ready to read;
The epoll_wait() is actually running in a seperate thread, and this is its only use: epoll_wait() (blocking), then scan the lookup table, find the ready socket entry, and call the CB function of the networking module that opened this socket.
Then this module performs 'recv()' without busy wait, without blocking and without EAGAIN. This, of course, speeds up the entire linux process.


Anyway, I encountered this weird problem.
When one of the network modules opens a TCP socket, tries to connect, and then close it quickly, sometimes I receive an EPOLLIN event on the closed socket.


1. [Network Module] <Open Socket X>
2. [Network Module] <add it to the epoll fd>
3. [Network Module] <add an entry in the lookup table, saying that socket X belongs to this network module>
4. [Network Module] connect to host
5. [Network Module] close socket X (and <remove it from the epoll fd>)
6. [Network Module] <remove the lookuptable entry of socket X>
7. [epoll_thread] received event EPOLLIN on socket X. No Lookup table entry for this socket !

This does not happen everytime, but it happens quite often.

The epoll man page says that closing a socket also removes it (the socket fd) from the epoll fd.
I perform both actions: epoll_ctl (DEL), as well as close().
Still, I sometimes get an event on the closed socket. I suspect that removing the socket fd from the epoll using epoll_ctl(DEL) or close() does not actually scan the ready list and cleans an existing event for this socket.

I think that when closing a socket or epoll_ctl(DEL) the kernel just removes it from the epoll fd, preventing it (the epoll fd) from monitoring the socket for events. If, however, there are events waiting, they stay there.

Can anyone confirm that, or maybe offer me some solution?

Thanks!