Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.
This repository was archived by the owner on May 12, 2021. It is now read-only.

Support proper removal of the container when the shim gets killed #274

@sboeuf

Description

@sboeuf

Here is an interesting case I have been debugging. I was trying to understand why a kubeadm reset was not working for the case of Kata. In this case, the only pod started with Kata is the kube-dns pod. For some reasons, when this pod is stopped and removed, first its containers receive some signals. 2 of them receive some SIGTERM signals, which seems the way to properly stop them, but the third container receives a SIGCONT. Obviously, nothing happens in this case, but apparently CRI-O considers this should be the end of the container and after a few seconds, it kills the container process (being the shim in Kata case). Because it is using a SIGKILL, the signal does not get transfered to the agent because the shim itself is killed right away.
After this happened, CRI-O calls into kata-runtime state, we detect the shim is not running anymore and we try to stop the container. The code will eventually call into agent.RemoveContainer(), but this will fail and return an error because inside the agent, the container is still running !
Now here are the different approach we could consider:

  • Enable a watcher in the shim, as a child process of the shim itself. The discussion had been already opened before, but the problem is that starting a child is almost like duplicating the footprint of the process. This is a real issue given the fact that we have a shim written in Go.
    The solution here is to write a shim in C, and the problem of duplicating will become negligible. But we're talking mid term here since the shim will unlikely be changed in one or two weeks.
  • From the agent code, force a container to be killed if this container is still running when the RemoveContainer() function gets called. The issue with this solution is that we're handling this case only from the Kata agent, where this issue should probably be handled from virtcontainers so that it would be reused across agent implementations.
  • From virtcontainers, from (c *Container) stop() error in container.go, after the shim has been waited, we could always throw a SIGKILL without checking for the error returned. Most of the cases will end up returning an error because the shim itself not being there actually represents the container inside the VM has terminated already. And in case the shim has been killed without the possibility to forward the signal, the SIGKILL will work and will allow the following call to agent.stopContainer() to proceed on the removal of the container inside the agent. This is the easiest solution for now.
  • Now, an evolution of the previous solution adding a call to killContainer(SIGKILL) would be to make such a call only if the container state is running. But this would involve providing a new agent protocol function to the API, something like ContainerStatus(). This would be called from container.go to check if the shim being stopped is matching the status of the container inside the VM. In case it does, nothing needs to be done, but in case it does not, then we should force a call into killContainer(SIGKILL). This is an evolution of the previous solution preventing from calling dumbly into killContainer(SIGKILL) everytime we remove a container, but instead only when it would be needed.

@egernst @bergwolf @jodh-intel @amshinde @grahamwhaley please let me know what you think make more sense here. I can implement the easiest solution for virtcontainers in one line of code, and I don't see drawbacks to it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions