agent: call remove after operation is finished#1192
Conversation
To avoid calling Controller methods concurrently, `Controller.Remove` is now called only *after* the last operation has proceeded. We do this by checking to see if the cancellation is available to detect an outstanding operation needs to be completed. Once the operation returns, we proceed with the shutdown procedure. Signed-off-by: Stephen J Day <stephen.day@docker.com>
|
@jinuxstyle Please give this patch a try with your test script. |
|
OK, I will. |
Current coverage is 55.23%@@ master #1192 diff @@
==========================================
Files 77 77
Lines 12193 12131 -62
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 6694 6700 +6
+ Misses 4579 4521 -58
+ Partials 920 910 -10
|
| @@ -203,6 +210,12 @@ func (tm *taskManager) run(ctx context.Context) { | |||
| if cancel != nil { | |||
| // cancel outstanding operation. | |||
| cancel() | |||
There was a problem hiding this comment.
I did the test with this patch applied and the problem of orphan containers was still there. The race illustrated in #1159 did not get fixed. The point is that the calling to cancel() is disruptive to the ongoing create operation. It causes the create operation return without waiting for the response from docker daemon because of context cancelled. Then the agent task proceeds to remove the container, while the docker daemon is creating it. When the remove request comes to the daemon before the container was created, the remove operation will fail and the container will become orphan after it's created.
In simple words, the problem could happen if the cancel() is called when the agent task is still creating the container.
There was a problem hiding this comment.
I see. So, it sounds like the bug is in the context handling. #1159 works because it completely disables the cancellation.
This is a race condition in the docker daemon. I'm not sure how to deal with this without doing a retry loop. Even then, if we don't have a guarantee about the state of the container after the Create method returns, we'd have to keep that container ID around until we can be absolutely sure.
We'll really need to fix this in the docker daemon, such that one can remove a container that is in the process of being created.
I've filed moby/moby#24858.
This PR should make it so Remove and container.create aren't issued concurrently, so it is still needed.
There was a problem hiding this comment.
@jinuxstyle We are going to merge this PR. This should address this issue when encountering in the docker daemon integration. There is still a race condition that needs to be tackled as part of moby/moby#24858 but that is going to be a more complex PR.
There was a problem hiding this comment.
This PR looks good and it will likely make the task operations more robust. So +1.
I am glad to see you also raised an issue to docker daemon. I agree that it would be better if docker could provide a way to cancel ongoing container operations like create.
|
LGTM |
1 similar comment
|
LGTM |
To avoid calling Controller methods concurrently,
Controller.Removeis now called only after the last operation has proceeded. We do this
by checking to see if the cancellation is available to detect an
outstanding operation needs to be completed. Once the operation returns,
we proceed with the shutdown procedure.
Signed-off-by: Stephen J Day stephen.day@docker.com
cc @jinuxstyle