Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion integration/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,7 @@ func Randomize(str string) string {
func KillProcess(name string) error {
var command []string
if goruntime.GOOS == "windows" {
command = []string{"tskill", strings.TrimSuffix(name, ".exe")}
command = []string{"taskkill", "/IM", name, "/F"}
} else {
command = []string{"pkill", "-x", fmt.Sprintf("^%s$", name)}
}
Expand Down
27 changes: 21 additions & 6 deletions integration/restart_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import (
"time"

"github.com/containerd/containerd"
"github.com/containerd/containerd/errdefs"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"golang.org/x/net/context"
Expand Down Expand Up @@ -139,12 +140,26 @@ func TestContainerdRestart(t *testing.T) {
waitCh, err := task.Wait(ctx)
require.NoError(t, err)

// NOTE: CRI-plugin setups watcher for each container and
// cleanups container when the watcher returns exit event.
// We just need to kill that sandbox and wait for exit
// event from waitCh. If the sandbox container exits,
// the state of sandbox must be NOT_READY.
require.NoError(t, task.Kill(ctx, syscall.SIGKILL, containerd.WithKillAll))
err = task.Kill(ctx, syscall.SIGKILL, containerd.WithKillAll)
if goruntime.GOOS != "windows" {
// NOTE: CRI-plugin setups watcher for each container and
// cleanups container when the watcher returns exit event.
// We just need to kill that sandbox and wait for exit
// event from waitCh. If the sandbox container exits,
// the state of sandbox must be NOT_READY.
require.NoError(t, err)
} else {
// NOTE(gabriel-samfira): On Windows, the "notready-sandbox" array
// only has a container in the ContainerState_CONTAINER_CREATED
// state and a container in the ContainerState_CONTAINER_EXITED state.
// Sending a Kill() to a task that has already exited, or to a task that
// was never started (which is the case here), will always return an
// ErrorNotFound (at least on Windows). Given that in this sanbox, there
// will never be a running task, after we recover from a containerd restart
// we can expect an ErrorNotFound here every time.
// The waitCh channel should already be closed at this point.
assert.True(t, errdefs.IsNotFound(err), err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment. The NOT-READY state means that we should stop the sandbox container and we can get NOT-Ready state after restart containerd. Even if there is no running containers, we are handling the sandbox one. It is confusing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This comment was confusing. Sorry about that.

Sending a Kill() to a task that is already stopped, or that was never started, will return an ErrorNotFound on Windows. This sandbox only has a container in stopped state and one in created state. So sending a kill here will always return an ErrorNotFound on Windows. By the time we send this Kill(), the wait channel for the task should already be closed and an ErrorNotFound can be safely ignored.

On Linux, there is also a container in started state, added here: https://github.com/containerd/containerd/blob/main/integration/restart_test.go#L83-L91

so on Linux, the Kill() function should always return a nil error.

Sorry for the confusion. I rewrote the comment. Hopefully it's clear now.

Copy link
Copy Markdown
Member

@fuweid fuweid Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, is the sandbox container running after RunPodSandbox? Even if there is no running containers.

The L103 is about to run pod sandbox container. the sandbox container with sid ID is running after L103. Since the test case is to make the sandbox into not-ready, the L134 is to kill it. So, if it is running state, the not-found doesn't make senses. That is why I am confusing 😂. If I understand the case correctly...

Copy link
Copy Markdown
Contributor Author

@gabriel-samfira gabriel-samfira Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After containerd restart, we end up with one sandbox in SANDBOX_READY and another one in SANDBOX_NOTREADY. This is checked here:

https://github.com/containerd/containerd/blob/main/integration/restart_test.go#L175-L181

so the test case should be fulfilled. Unless I'm reading it wrong 😀.

Another option is to simply not try and send a kill there (on Windows), as there is no running container, if you prefer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After containerd restart, we end up with one sandbox in SANDBOX_READY and another one in SANDBOX_NOTREADY.

Before restart, how to change the running sandbox container into exited state? We should send signal, right? If so, why we can receive not-found error?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing sandboxes after the kill and printing the status shows that both are ready.

Did you wait for a while? The state is updated in async way.

func handleSandboxExit(ctx context.Context, e *eventtypes.TaskExit, sb sandboxstore.Sandbox) error {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added logging to the test that showed the status was unchanged by the time containerd restarted. Should we change the test and poll until the sandbox transitions to NOTREADY before restarting containerd?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do that. The task.Wait return can guarantee that. :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It is about timing. 😂

}

select {
case <-waitCh:
Expand Down