Find the answer to your Linux question:
Results 1 to 7 of 7
Hi, I'm writing an init system (think Upstart or systemd) called Epoch, but I'm running into an infuriating bug. I can't find answers on search engines, so please don't say ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined! bendib's Avatar
    Join Date
    Jul 2009
    Location
    &Imeuta >> 4.7777777777777777
    Posts
    21

    Zombies created even after successful waitpid()


    Hi, I'm writing an init system (think Upstart or systemd) called Epoch, but I'm running into an infuriating bug. I can't find answers on search engines, so please don't say "google it", and the elitist trolls have annexed ##posix on freenode, so I am coming here in hopes I can find answers.

    NOTE: I know the C language extensively, but I did not learn POSIX programming, and that is now biting me in the butt.

    My problem is that in the init system's method for launching services, after I call vfork() and successfully retrieve the exit status of the child via waitpid(), I still end up with virtually everything spawned by Epoch becoming a zombie. I wait by the PID of the child, not just catching any of them, because some of them are not supposed to have their statuses harvested and some are, so I need to catch exactly the right one every time.

    Here is a snippet of code that may prove relevant:

    Code:
    pid_t LaunchPID;
    ...
    int RawExitStatus;
    ...
    LaunchPID = vfork();
    
    if (LaunchPID < 0)
    {
        SpitError("Failed to call vfork(). This is a critical error.");
        EmergencyShell();
    }
    if (LaunchPID == 0) /**Child process code.**/
    { /*Child does all this.*/
        char TmpBuf[1024];
    
        /*Change our session id.*/
        setsid();
    
        execlp(ShellPath, "sh", "-c", CurCmd, NULL); /*I bet you think that this is going to return the PID of sh. No.*/
        /*We still around to talk about it? We were supposed to be imaged with the new command!*/
    
        snprintf(TmpBuf, 1024, "Failed to execute %s: execlp() failure.", InObj->ObjectID);
        SpitError(TmpBuf);
        EmergencyShell();
    }
    ...
    /**Parent code resumes.**/
    waitpid(LaunchPID, &RawExitStatus, 0); /*Wait for the process to exit.*/
    This code is C89 compliant, if you are wondering about the location of declarations.

    I greatly appreciate any help you can offer.

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,380
    waitpid() will suspend until the specified child process terminates. It will ONLY wait for that one. In the mean time, other child processes can die, and become zombies. Use the wait() function instead. That will run when any child process is terminated. I have used this for years without problems. Use of waitpid() is intended for very specific scenarios, and not a general case as yours seems to be.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined! bendib's Avatar
    Join Date
    Jul 2009
    Location
    &Imeuta >> 4.7777777777777777
    Posts
    21
    Quote Originally Posted by Rubberman View Post
    waitpid() will suspend until the specified child process terminates. It will ONLY wait for that one. In the mean time, other child processes can die, and become zombies. Use the wait() function instead. That will run when any child process is terminated. I have used this for years without problems. Use of waitpid() is intended for very specific scenarios, and not a general case as yours seems to be.
    I appreciate your thoughts, but, irritatingly enough, I have verified that this is not the issue. If I build a test binary that uses these routines and build a test configuration, everything executes perfectly and no zombies are created. It's only when Epoch is running as init.
    All children are handled by Epoch. They just need to be forced to be handled in a certain order. It's ugly code, but, I cannot think of another solution even nearly as elegant.

    The irritating thing is, everyone thinks it's waitpid() or a child elsewhere or that I'm just dumb. It's an unfortunate happenstance that I would run into such a crippling bug.

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,380
    In this case, I would suspect that the bug is in your own code. This is pretty thoroughly vetted system code, and would be seen in many other scenarios if it were not.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Just Joined! bendib's Avatar
    Join Date
    Jul 2009
    Location
    &Imeuta >> 4.7777777777777777
    Posts
    21
    I figured out that it's only applications that daemonize themselves (I can run /bin/login for example) that this happens to. Is there some reason why it would only affect services that daemonize? Is there a problem with my fork exec method interfering with the fork exec of a daemon? Can someone shed some light on this?

  6. #6
    Just Joined! bendib's Avatar
    Join Date
    Jul 2009
    Location
    &Imeuta >> 4.7777777777777777
    Posts
    21
    UPDATE: This problem is still present, but I have created an issue on github.

    I really need help with this, so if you can help, please don't hesitate.

    https://github.com/Subsentient/epoch/issues/3

  7. #7
    Just Joined! bendib's Avatar
    Join Date
    Jul 2009
    Location
    &Imeuta >> 4.7777777777777777
    Posts
    21
    I fixed it. It turns out, init is required to reap other children than those created by it's own process. I implemented this and the problems stopped. Putting waitpid(-1, NULL, WNOHANG) in the main loop fixed it, but I had to implement a method to prevent this from harvesting children that I needed to retrieve status from. This was not difficult.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •