Support proper removal of the container when the shim gets killed

Here is an interesting case I have been debugging. I was trying to understand why a `kubeadm reset` was not working for the case of Kata. In this case, the only pod started with Kata is the `kube-dns` pod. For some reasons, when this pod is stopped and removed, first its containers receive some signals. 2 of them receive some `SIGTERM` signals, which seems the way to properly stop them, but the third container receives a `SIGCONT`. Obviously, nothing happens in this case, but apparently CRI-O considers this should be the end of the container and after a few seconds, it kills the container process (being the shim in Kata case). Because it is using a SIGKILL, the signal does not get transfered to the agent because the shim itself is killed right away.
After this happened, CRI-O calls into `kata-runtime state`, we detect the shim is not running anymore and we try to stop the container. The code will eventually call into `agent.RemoveContainer()`, but this will fail and return an error because inside the agent, the container is still running !
Now here are the different approach we could consider:
- Enable a watcher in the shim, as a child process of the shim itself. The discussion had been already opened before, but the problem is that starting a child is almost like duplicating the footprint of the process. This is a real issue given the fact that we have a shim written in Go.
The solution here is to write a shim in C, and the problem of duplicating will become negligible. But we're talking mid term here since the shim will unlikely be changed in one or two weeks.
- From the agent code, force a container to be killed if this container is still running when the `RemoveContainer()` function gets called. The issue with this solution is that we're handling this case only from the Kata agent, where this issue should probably be handled from virtcontainers so that it would be reused across agent implementations.
- From virtcontainers, from `(c *Container) stop() error` in `container.go`, after the shim has been waited, we could always throw a SIGKILL without checking for the error returned. Most of the cases will end up returning an error because the shim itself not being there actually represents the container inside the VM has terminated already. And in case the shim has been killed without the possibility to forward the signal, the SIGKILL will work and will allow the following call to `agent.stopContainer()` to proceed on the removal of the container inside the agent. This is the easiest solution for now.
- Now, an evolution of the previous solution adding a call to `killContainer(SIGKILL)` would be to make such a call only if the container state is running. But this would involve providing a new agent protocol function to the API, something like `ContainerStatus()`. This would be called from `container.go` to check if the shim being stopped is matching the status of the container inside the VM. In case it does, nothing needs to be done, but in case it does not, then we should force a call into `killContainer(SIGKILL)`. This is an evolution of the previous solution preventing from calling dumbly into `killContainer(SIGKILL)` everytime we remove a container, but instead only when it would be needed.

@egernst @bergwolf @jodh-intel @amshinde @grahamwhaley please let me know what you think make more sense here. I can implement the easiest solution for virtcontainers in one line of code, and I don't see drawbacks to it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support proper removal of the container when the shim gets killed #274

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support proper removal of the container when the shim gets killed #274

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions