Skip to content

Fix race condition between task creation and removal#1152

Closed
jinuxstyle wants to merge 1 commit into
moby:masterfrom
jinuxstyle:fix-race-create-remove
Closed

Fix race condition between task creation and removal#1152
jinuxstyle wants to merge 1 commit into
moby:masterfrom
jinuxstyle:fix-race-create-remove

Conversation

@jinuxstyle
Copy link
Copy Markdown
Contributor

This commit fixes a race condition between task creation and removal
which could causes orphan containers. It happens when a agent task is
handling creation event and creating a container, a removal event
comes in and cancels the task's context. Both the creation and
removal would fail but the container would still created by the docker
daemon. The removal fails because when the DELETE request arrives the
docker daemon, the container is still being created.

goroutine (creation)                   task (removal)
---------                              ----

...                                    ...
ctlr.Prepare(ctx)
  r.adapter.createNetworks(ctx)
  ...                                  case <-shutdown
  r.adapter.create(ctx)                  cancel()
  //failed due to context cancelled      tm.ctlr.Remove(ctx)
  //container still createdby daemon     //failed due to "No such
                                         //container" error because
                                         //the container not created
                                         //yet
  ...
  //the container created by docker
  //daemon but becomes orphan

This patch fixes it by moving the cancel operation to be after the
remove operation. In addition, a lock is added for exclusive access
to the container resouces between creation and removal operations.
These ensures that the creation of a container could either be
finished or not started when the remove operation enters the critical
section. The pull operation is moved out the critical section because
it might be too time consuming.

Signed-off-by: Jin Xu jinuxstyle@hotmail.com

This commit fixes a race condition between task creation and removal
which could causes orphan containers. It happens when a agent task is
handling creation event and creating a container, a removal event
comes in and cancels the task's context. Both the creation and
removal would fail but the container would still created by the docker
daemon. The removal fails because when the DELETE request arrives the
docker daemon, the container is still being created.

  goroutine (creation)                   task (removal)
  ---------                              ----
  ...                                    ...
  ctlr.Prepare(ctx)
    r.adapter.createNetworks(ctx)

    ...                                  case <-shutdown
    r.adapter.create(ctx)                  cancel()
    //failed due to context cancelled      tm.ctlr.Remove(ctx)
    //container still createdby daemon     //failed due to "No such
                                           //container" error because
                                           //the container not created
                                           //yet
    ...
    //the container created by docker
    //daemon but becomes orphan

This patch fixes it by moving the cancel operation to be after the
remove operation. In addition, a lock is added for exclusive access
to the container resouces between creation and removal operations.
These ensures that the creation of a container could either be
finished or not started when the remove operation enters the critical
section. The pull operation is moved out the critical section because
it might be too time consuming.

Signed-off-by: Jin Xu <jinuxstyle@hotmail.com>
@jinuxstyle
Copy link
Copy Markdown
Contributor Author

I would like to close this PR in favor of another PR #1154, which fixes the issue with a simpler solution. Please review #1154 first. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants