Skip to content

fix process termination handling for runc exec#3722

Merged
coryb merged 1 commit into
moby:masterfrom
coryb:runc-exec-zombie
Mar 17, 2023
Merged

fix process termination handling for runc exec#3722
coryb merged 1 commit into
moby:masterfrom
coryb:runc-exec-zombie

Conversation

@coryb
Copy link
Copy Markdown
Collaborator

@coryb coryb commented Mar 16, 2023

This patch makes the process handling consistent between runc.Run and runc.Exec usage. Previously runc.Run would use context.Background for the runc.Run process and would monitor the request context for shutdown requests, sending a SIGKILL to the container pid1 process. This allowed runc.Run to gracefully shutdown and reap child processes. This logic was not used for runc.Exec where instead we were passing in the request context to runc.Exec, and if that request context was cancelled the runc process would immediately terminate preventing runc from reaping the child process. In this scenario the extra pid will remain forever and then when the pid1 process will get wedged in zap_pid_ns_processes syscall upon shutdown waiting fo the zombie pid to exit.

With this fix both runc.Run and runc.Exec will use context.Background for runc processes and monitor the request context for shutdown request triggering a SIGKILL to the pid being monitored by runc.

This patch was split off from #3658
The tests from that PR verifies this all works as expected.

Comment thread executor/runcexecutor/executor_linux.go Outdated

var eg errgroup.Group
egCtx, cancel := context.WithCancel(ctx)
egCtx, cancel := context.WithCancel(context.Background())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be:

eg, egCtx := errgroup.WithContext(context.Background)

or maybe at least

egCtx, cancel := context.WithCancel(context.Background())
eg, egCtx := errgroup.WithContext(egCtx)

Comment thread executor/runcexecutor/executor.go Outdated
ended chan struct{}
}

// newStartingProcess will create a startingProcess that will be monitored, where
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be follow-up but I wonder what a better name would be for this. Is it like runcProcessHandle?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wonder if the signature could be

h, ctx := newRuncProcessHandle(ctx, ...)

The returned context would be the one that is based on Background() and doesn't immediately get canceled when the input context gets canceled.

@coryb coryb force-pushed the runc-exec-zombie branch from e8c83d2 to 7fdec34 Compare March 17, 2023 18:32
Comment thread executor/runcexecutor/executor_linux.go Outdated
runcProcess := &startingProcess{
ready: make(chan struct{}),
}
runcProcess, runcCtx, cancel := runcProcessHandle(ctx, id)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both cancel and Release()? If the timing is different then cancel can still just be a method on runcProcess (.Close() or .Stop()).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the cancel was used to stop the signal/resize loops, allowing the waitgroup to exit cleanly. I have stored the cancel in the procHandle and it will be called with a runcProcess.Shutdown() call now.

Comment thread executor/runcexecutor/executor_linux.go Outdated

var eg errgroup.Group
egCtx, cancel := context.WithCancel(ctx)
eg, egCtx := errgroup.WithContext(runcCtx)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on line 64 (can't leave comment there), is ctx correct?

I think we should be able to pretty much just use ctx var now, without the need for runcCtx and egCtx variables.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on line 64. I have updated to use ctx consistently now. I also had to copy over the logger from the request context to preserve that for logging in the errgroups.

@coryb coryb force-pushed the runc-exec-zombie branch from 7fdec34 to c703ea1 Compare March 17, 2023 19:42
This patch makes the process handling consistent between runc.Run and
runc.Exec usage.  Previously runc.Run would use context.Background
for the runc.Run process and would monitor the request context for
shutdown requests, sending a SIGKILL to the container pid1 process. This
allowed runc.Run to gracefully shutdown and reap child processes.  This
logic was not used for runc.Exec where instead we were passing in the
request context to runc.Exec, and if that request context was cancelled
the runc process would immediately terminate preventing runc from reaping
the child process.  In this scenario the extra pid will remain forever
and then when the pid1 process will get wedged in zap_pid_ns_processes
syscall upon shutdown waiting fo the zombie pid to exit.

With this fix both runc.Run and runc.Exec will use context.Background
for runc processes and monitor the request context for shutdown request
triggering a SIGKILL to the pid being monitored by runc.

Signed-off-by: coryb <cbennett@netflix.com>
@coryb coryb force-pushed the runc-exec-zombie branch from c703ea1 to b76f8c0 Compare March 17, 2023 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants