From 4abca880a844e7338c5ba144732173cd7fbc90b9 Mon Sep 17 00:00:00 2001 From: Jin Xu Date: Mon, 11 Jul 2016 11:21:00 +0800 Subject: [PATCH] Fix race condition between task creation and removal This commit fixes a race condition between task creation and removal which could causes orphan containers. It happens when a agent task is handling creation event and is creating a container, a removal event comes in and cancels the task's context, causing both the creation and removal fail, but the container is still created by the docker daemon. The removal fails because when the DELETE request arrives the docker daemon, the container is still being created. goroutine (creation) task (removal) --------- ---- ... ... ctlr.Prepare(ctx) r.adapter.createNetworks(ctx) ... case <-shutdown r.adapter.create(ctx) cancel() //failed due to context cancelled tm.ctlr.Remove(ctx) //container still createdby daemon //failed due to "No such //container" error because //the container not created //yet ... //the container created by docker //daemon but becomes orphan This patch fixes it by delaying the cancel option until the ongoing operation is finished if the task is in the preparing state, in which state a container will be created for the task. This way, the removal operation would never race with the creation operation. Signed-off-by: Jin Xu --- agent/task.go | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/agent/task.go b/agent/task.go index 797f93c769..e7625adeac 100644 --- a/agent/task.go +++ b/agent/task.go @@ -201,6 +201,17 @@ func (tm *taskManager) run(ctx context.Context) { } case <-shutdown: if cancel != nil { + // Wait for operation to finish if the task + // is in preparing state. This avoid the race + // between Prepare and Remove operations, + // which could cause orphan containers. + if tm.task.Status.State == api.TaskStatePreparing { + select { + case <-errs: + case <-statusq: + } + } + // cancel outstanding operation. cancel() }