We've seen flakiness in the containerd CI caused by the TestRestartMonitor test. The errors seem to have two origins, both in hcsshim:
- The shim process exits before resources are released, and lingering layers will cause an error indicating that files (usually VHDXs) are still in use, when
containerd tries to cleanup after a container is deleted.
- Sending a second
Kill() to the same container process handle returns an Access is denied. error.
The first issue is detailed here: #1249
The second issue happens only sometimes, if the process hasn't yet exited by the time the second Kill() is sent by the test cleanup functions. From my tests, ignoring this error when a second Kill() is sent, seemed to be fine, as the process we are attempting to kill, does eventually die and handles, resources do get cleaned up and process watchers return without error. It's just that at the moment of sending the HcsTerminateProcess, it is in a state where the OS returns an Access is denied. error. If HcsTerminateProcess calls TerminateProcess in the context of a container, then that call doesn't kill the process immediately. It's still allowed to finish any pending I/O and release it's handles. Which means that one solution to this may be to wait for the process to finish, and simply skip sending a second terminate signal.
I am unsure what a proper fix for this would be.
We've seen flakiness in the
containerdCI caused by theTestRestartMonitortest. The errors seem to have two origins, both inhcsshim:containerdtries to cleanup after a container is deleted.Kill()to the same container process handle returns anAccess is denied.error.The first issue is detailed here: #1249
The second issue happens only sometimes, if the process hasn't yet exited by the time the second
Kill()is sent by the test cleanup functions. From my tests, ignoring this error when a secondKill()is sent, seemed to be fine, as the process we are attempting to kill, does eventually die and handles, resources do get cleaned up and process watchers return without error. It's just that at the moment of sending theHcsTerminateProcess, it is in a state where the OS returns anAccess is denied.error. IfHcsTerminateProcesscallsTerminateProcessin the context of a container, then that call doesn't kill the process immediately. It's still allowed to finish any pending I/O and release it's handles. Which means that one solution to this may be to wait for the process to finish, and simply skip sending a second terminate signal.I am unsure what a proper fix for this would be.