fix process termination handling for runc exec#3722
Conversation
|
|
||
| var eg errgroup.Group | ||
| egCtx, cancel := context.WithCancel(ctx) | ||
| egCtx, cancel := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
should this be:
eg, egCtx := errgroup.WithContext(context.Background)
or maybe at least
egCtx, cancel := context.WithCancel(context.Background())
eg, egCtx := errgroup.WithContext(egCtx)
| ended chan struct{} | ||
| } | ||
|
|
||
| // newStartingProcess will create a startingProcess that will be monitored, where |
There was a problem hiding this comment.
This can be follow-up but I wonder what a better name would be for this. Is it like runcProcessHandle?
There was a problem hiding this comment.
Also wonder if the signature could be
h, ctx := newRuncProcessHandle(ctx, ...)
The returned context would be the one that is based on Background() and doesn't immediately get canceled when the input context gets canceled.
| runcProcess := &startingProcess{ | ||
| ready: make(chan struct{}), | ||
| } | ||
| runcProcess, runcCtx, cancel := runcProcessHandle(ctx, id) |
There was a problem hiding this comment.
Do we need both cancel and Release()? If the timing is different then cancel can still just be a method on runcProcess (.Close() or .Stop()).
There was a problem hiding this comment.
Yeah, the cancel was used to stop the signal/resize loops, allowing the waitgroup to exit cleanly. I have stored the cancel in the procHandle and it will be called with a runcProcess.Shutdown() call now.
|
|
||
| var eg errgroup.Group | ||
| egCtx, cancel := context.WithCancel(ctx) | ||
| eg, egCtx := errgroup.WithContext(runcCtx) |
There was a problem hiding this comment.
on line 64 (can't leave comment there), is ctx correct?
I think we should be able to pretty much just use ctx var now, without the need for runcCtx and egCtx variables.
There was a problem hiding this comment.
Good catch on line 64. I have updated to use ctx consistently now. I also had to copy over the logger from the request context to preserve that for logging in the errgroups.
This patch makes the process handling consistent between runc.Run and runc.Exec usage. Previously runc.Run would use context.Background for the runc.Run process and would monitor the request context for shutdown requests, sending a SIGKILL to the container pid1 process. This allowed runc.Run to gracefully shutdown and reap child processes. This logic was not used for runc.Exec where instead we were passing in the request context to runc.Exec, and if that request context was cancelled the runc process would immediately terminate preventing runc from reaping the child process. In this scenario the extra pid will remain forever and then when the pid1 process will get wedged in zap_pid_ns_processes syscall upon shutdown waiting fo the zombie pid to exit. With this fix both runc.Run and runc.Exec will use context.Background for runc processes and monitor the request context for shutdown request triggering a SIGKILL to the pid being monitored by runc. Signed-off-by: coryb <cbennett@netflix.com>
This patch makes the process handling consistent between runc.Run and runc.Exec usage. Previously runc.Run would use context.Background for the runc.Run process and would monitor the request context for shutdown requests, sending a SIGKILL to the container pid1 process. This allowed runc.Run to gracefully shutdown and reap child processes. This logic was not used for runc.Exec where instead we were passing in the request context to runc.Exec, and if that request context was cancelled the runc process would immediately terminate preventing runc from reaping the child process. In this scenario the extra pid will remain forever and then when the pid1 process will get wedged in zap_pid_ns_processes syscall upon shutdown waiting fo the zombie pid to exit.
With this fix both runc.Run and runc.Exec will use context.Background for runc processes and monitor the request context for shutdown request triggering a SIGKILL to the pid being monitored by runc.
This patch was split off from #3658
The tests from that PR verifies this all works as expected.