New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evented Process Monitor #11529
Comments
ping @ibuildthecloud |
Multiple child processes dying quickly could generate a single SIGCHLD so would have to syscall.Wait4 in a loop till you get -1 unless golang is changing signal semantics somehow. |
@mrunalp no, you are correct. |
I do declare, that's awesome. |
+1 On Fri, Mar 20, 2015 at 9:07 PM, Brian Goff notifications@github.com
Randy D. Wallace Jr. |
@crosbymichael I was thinking of a design where this would be a drop in replacement for Obviously this doesn't buy any benefits in the terms of reducing goroutines. The new |
I'd just like to raise the canary of caution 🐦 with respect to introducing a non-idiomatic approach to such a common task. Callbacks seem quite a strange suggestion in go. But then, in a way, avoiding a few thousand goroutines also seems a bit strange. It's usually just said, "start a goroutine, they're cheap!". Have large numbers of goroutines proven themselves to be a problem? |
The approach I described still allows for a go idiomatic approach Wait(), but will have to be using a custom package name and not exec. The number of goroutine isn't a huge issue, the root issue just has to do with the calling style. With the current approach it's really hard to cleanly manage zombies. Its a really common pattern in supervisor style code to have an event driven model to handle processes. Because of some things in the guts of golang we can't combine an event driven model with other goroutines calling Command.Wait() |
This is a big deal. It's one of the things that are hurting people in production. I don't think it should be a requirement for a large-scale Docker deployment to have to care about A lot of people currently don't write custom
Meaning if someone is running a Docker deployment with high container churn (which is common), e.g. an application with a master-worker model where the workers get killed if they take too long. If those workers are I don't think it should be required for Docker users, even in very large deployments, to have this much of an understand of the PID namespace (it's documented on the man page) and their container layouts. Docker should definitely be reaping children to avoid this problem. ConcernsWhat happens when you have multiple processes trying to Another question I'd have is whether the kernel assumes anything else about Observations from
|
I can open a separate issue on this matter since it's somewhat orthogonal to this discussion. |
The missing piece from my description is to set the subreaper option on the docker daemon to support this. The full docs can be found here but here is a quick description.
|
Had a good discussion with @crosbymichael on this on IRC, agree with setting |
Yeah, won't be much of a reaper without it unless running as pid 1. |
@ibuildthecloud I havn't thought about it much but maybe we could model this after the http.Handler with a middle router/muxer for handling exits. Don't stress over the first implementation, we can get something working and then itterate on the API |
From what @mrunalp said about the look for reaping fast existing children, I experimented and you have to handle at func reapChildren() (exits []exit, err error) {
for {
var (
ws syscall.WaitStatus
rus syscall.Rusage
)
pid, err := syscall.Wait4(-1, &ws, syscall.WNOHANG, &rus)
if err != nil {
if err == syscall.ECHILD {
return exits, nil
}
return nil, err
}
exits = append(exits, exit{
pid: pid,
status: ws.ExitStatus(),
})
}
} |
@crosbymichael Are you working on an implementation of this? If you're not I can start on something. |
No i'm not working on it. |
WANT!!!! |
Spammy +1 :P |
No fair, why does @crosbymichael get all the fun PRs :) j/k looking forward to the implementation. |
@ibuildthecloud I think he meant he's not working on it :) |
@sirupsen lol, man I suck at reading. I'll starting working on this. |
Now that it exists, would it be possible to just use |
… processing reaping. Run notebook with some default arguments if no other command is provided.
@tiborvass still looking for volunteers? |
@mynameismevin yes please :) |
I'll work on coming up with something based off the discussion that's been had thus far. |
Has progress stalled on this issue? I noticed that @mynameismevin's PR was not merged. Just to be clear, this would solve the current problem of processes having PID 1 in a container and having to reap grand-children, right? |
@edevil possibly this will now be implemented through containerd; https://github.com/docker/containerd @crosbymichael am I right? |
@crosbymichael, is @thaJeztah right? :) |
@edevil my PR was really just a summation of this conversation, and I did nothing outside of just providing some code in a package for others to use in solving this problem. I didn't fix the issue, just assisted with moving it closer towards resolution. Also, I didn't test it when it came to reaping grandchildren, however I did test abandoned child processes, so theoretically it could clean up abandoned grand-children so long as the child cleaned up the grand-child. No promises on that actually working as it wasn't tested. |
This is directly linked to #20361, and is on our 1.11 roadmap. |
AFAIK containerd doesn't solve the zombie problem. This issue was opened for evented process monitoring, which I assume has been fixed w/ containerd, but the sub discussions about zombie killing has not been addressed. @sirupsen I would probably open a new issue. Can somebody correct me if I'm wrong, because I'm not seeing zombie killing working in 1.11.0-rc3. It's my understanding that process subreapers don't work across a PID namespace boundary. So if you child creates a new PID namespace and you are set to the subreaper of the child, processes will still be reparented to the PID 1 of the child PID namespace and not yourself (the subreaper). |
@ibuildthecloud You are correct, sadly :( |
So using an init system (like tini) is still the way to go to solve the zombie processes, right? |
TLDR:
Pros:
Cons:
Currently docker uses
exec.Cmd
andcmd.Wait()
inside a goroutine for blocking on a container's processes until it finishes. After a container dies two things can happen. One, the container's process is restarted depending if the user requested that the container always restart. Two, we tear down the container and update it's status with the exit code and release any resources required.Writing a process monitor like this is not very efficient and docker is unable to handle
SIGCHLD
events to reap zombies that were not direct children of the daemon's process.It also means that if we have one goroutine per container, if we have 1000 containers we have 1000 blocking goroutines in the daemon. Booooo.
We can do better. The proper way is to move to an evented system for monitoring when child processes change state. This can be handled via
SIGCHLD
. A process can setup a signal handler forSIGCHLD
events and when the status of a child process changes this signal is sent to the handler. We can use this to extract the pid, exit status, and make decision on how to handle the event.Using an evented system like this we can reduce the number of goroutines to 1 for N number of containers also reduce the amount of locks that are required to handle the previous level of concurrency. Running one container will require 1 goroutine, running 10k container's will require 1 goroutine, win. This model also allows us to reap zombies, because zombies are bad m'kay, in the daemon process that are not direct children. i.e. non container PID1's.
A sample code of what the process monitor would look like is follows:
We should build this in a generic way because we want this monitor with restart capabilities to be available to any type of process the daemon can spawn.
The text was updated successfully, but these errors were encountered: