Change kill and stop implementation to enhance compatibility#8163
Change kill and stop implementation to enhance compatibility#8163lcastellano merged 1 commit intovmware:masterfrom
Conversation
| pid = session.Cmd.Process.Pid | ||
| log.Infof("sending signal %s (%d) to process %d for %s", sig.Signal, num, pid, session.ID) | ||
| } else { | ||
| pid = -session.Cmd.Process.Pid |
There was a problem hiding this comment.
Why the negative sign here, but not above? (It seems like there's some significance to the sign of the pid, but I'm not familiar enough with this code to know what it means.)
There was a problem hiding this comment.
Sending a signal to a positive pid sends the signal to one process only; the one identified by the pid. Sending a signal to a negative pid sends the signal to all the processes belonging to the process group identified by the positive value of the pid.
lib/portlayer/exec/base.go
Outdated
| stop := []string{cs.StopSignal, string(ssh.SIGKILL)} | ||
| if stop[0] == "" { | ||
| stop[0] = string(ssh.SIGTERM) | ||
| stopActions := []string{"kill", "groupKill"} |
There was a problem hiding this comment.
Should these be defined as constants somewhere? I see they're used in a switch statement below.
There was a problem hiding this comment.
There really should be. There is the lib/tether/shared package specifically for constants that should be shared between the tether and other subsystems.
I had to read this twice and that's knowing what the intent is. For clarity can we define the action/signal grouping together explicitly. Whether as a map[string][ssh.Action] or as:
stopSignal := cs.StopSignal
if stopSignal == "" {
stopSignal = string(ssh.SIGTERM)
}
// action/signal pairs
stopActions := []string{"kill", "groupKill"}
stopSignals := []string{stopSignal, string(ssh.SIGKILL)}
var killed bool
...
lib/portlayer/exec/base.go
Outdated
| stop := []string{cs.StopSignal, string(ssh.SIGKILL)} | ||
| if stop[0] == "" { | ||
| stop[0] = string(ssh.SIGTERM) | ||
| stopActions := []string{"kill", "groupKill"} |
There was a problem hiding this comment.
There really should be. There is the lib/tether/shared package specifically for constants that should be shared between the tether and other subsystems.
I had to read this twice and that's knowing what the intent is. For clarity can we define the action/signal grouping together explicitly. Whether as a map[string][ssh.Action] or as:
stopSignal := cs.StopSignal
if stopSignal == "" {
stopSignal = string(ssh.SIGTERM)
}
// action/signal pairs
stopActions := []string{"kill", "groupKill"}
stopSignals := []string{stopSignal, string(ssh.SIGKILL)}
var killed bool
...
lib/tether/tether_linux.go
Outdated
| session.Unlock() | ||
|
|
||
| // Don't hold the lock while waiting for the | ||
| // file descriptors to close |
There was a problem hiding this comment.
// Don't hold the lock while waiting for the file descriptors
// to close as these can be held open by child processes
lib/tether/toolbox.go
Outdated
| log.Infof("sending signal %s (%d) to process %d for %s", sig.Signal, num, pid, session.ID) | ||
| } else { | ||
| pid = -session.Cmd.Process.Pid | ||
| log.Infof("sending signal %s (%d) to process group %d for %s", sig.Signal, num, -pid, session.ID) |
There was a problem hiding this comment.
Is the -pid intended here - it's a double negative but maybe that's what you were after explicitly for printing the pgid?
| ${rc} ${output}= Run And Return Rc And Output docker %{VCH-PARAMS} kill -s HUP ${id} | ||
| Should Be Equal As Integers ${rc} 0 | ||
| Wait Until Keyword Succeeds 20x 200 milliseconds Assert Container Output ${id} KillSignalHUP | ||
| Wait Until Keyword Succeeds 20x 200 milliseconds Assert Not In Container Output ${id} KillSignalHUP |
There was a problem hiding this comment.
There's no point in Wait Until ... here - that was present previously to wait for expected output. With the inversion of the test this will either immediately succeed or always fail.
That said I'm not sure this test should have it inverted.... I thought we were still wanting to deliver the signal to the main process, which should be the shell running the trap command in this case? Was this supposed to be changed in the test Confirm signal is not delivered to entire process group?
There was a problem hiding this comment.
The purpose of this test is to make sure that the child did not receive the signal. We could assert that the parent gets the signal, but there is an existing test (that does not use the nested shell) just above this one that does exactly that. I am not sure why we shouldn't wait for the output to drain; that is what makes sure that the child process did not receive the signal and processed it by writing the message to the log. This test is meant to catch the error that caused issue: #8152.
1219bad to
d556822
Compare
lib/portlayer/exec/base.go
Outdated
| stopSignal = string(ssh.SIGTERM) | ||
| } | ||
|
|
||
| type actionSignal struct { |
There was a problem hiding this comment.
I'm not sure the type declaration is needed if using an anonymous declaration as well.
Alternatively use the type name when declaring the array.
| @@ -96,9 +102,9 @@ Confirm signal delivered to entire process group | |||
| Should Be Equal As Integers ${rc} 1 | |||
| ${rc} ${output}= Run And Return Rc And Output docker %{VCH-PARAMS} kill -s HUP ${id} | |||
There was a problem hiding this comment.
We may want to wait for the HUP signal to be confirmed before we call stop:
Wait Until Keyword Succeeds 20x 200 milliseconds Assert In Container Output ${id} KillSignalHUP
THEN we stop it and assert that the child didn't print output.
e68c0ff to
dc95575
Compare
There was a problem hiding this comment.
I don't think I entirely understand this change, but the tests seem to show that it works as expected. Thank you for the in-person explanation! This is to make the way we handle docker stop and docker killconsistent with the way the Docker daemon handles those commands.
This PR implements a set of changes that makes VIC behave like the Docker server when dealing with "kill" and "stop" commands. When a "kill" command is sent to a container, only the top process receives the signal. When a "stop" command is sent to a container the Stop Signal is sent to the top process, after 10 seconds a SIGKILL signal is sent to all the member of the process group.
This PR contains the following set of changes:
[specific ci=1-14-Docker-Kill]