Hi guys,

I'm running a couple of game servers for Warcraft 3 based on GHost++ 17.0. It works like a charm on most of the servers, but on a few of them, mainly those from server4you or from VPS providers, I have an odd problem, that I failed to solve even tho searching for a solution for a couple of weeks now.

Basically, there are multiple processes of my modified version of ghost++ running, each has 10 connections, one to each of the 10 players ingame.
The connections are in a FD_SET and are handled using select() in a main loop that updates whenever a fd is ready or 50ms have passed.

So far there are no problems. But on some of the servers, at random times, from once a week to multiple times a day, a big percentage of the players get dropped at once (usually connection reset), not only in one process, but in many processes at the same time. Usually about 20-60%, which pretty much 1:1 can be translated into 20-60 players/connections get dropped simultaneously.

The servers in question are running different system, but all debian. The worst server runs kernel 2.6.32-bpo.5-amd64, gcc-4.4 and is hosted by server4you in Köln, Germany. The IT support from server4you told us there is nothing wrong on their end, there are no messages in dmesg or other logfiles that I know of, there are no errors on eth0 and the CPU is at ~95% idle all time. There is nothing else running on those servers, except for a spawning process that is listening on a port for incoming requests to spawn new ghost++ processes. (Ofc there is the usual Debian stuff running but it's not used. The server was newly installed and it's only purpose is to serve games)

I'm really desperate to find out what is causing this. Can someone point me into the right direction or give me some tips how to debug this?

I hope I didn't forget any vital information. Thanks so much for your attention.

tl;dr:
Lots of people disconnect at the same time, seemingly random. Sockets handled using select().

NarOzlington

PS:
I was unable to find any parallels between the drops.

PPS:
Most people are able to reconnect seconds later.