[PATCH v2] Fix race condition between task creation and removal by jinuxstyle · Pull Request #1154 · moby/swarmkit

jinuxstyle · 2016-07-11T10:18:10Z

This commit fixes a race condition between task creation and removal
which could causes orphan containers. It happens when a agent task is
handling creation event and is creating a container, a removal event
comes in and cancels the task's context, causing both the creation and
removal fail, but the container is still created by the docker daemon.
The removal fails because when the DELETE request arrives the docker
daemon, the container is still being created.

goroutine (creation)                   task (removal)
---------                              ----

...                                    ...
ctlr.Prepare(ctx)
  r.adapter.createNetworks(ctx)
  ...                                  case <-shutdown
  r.adapter.create(ctx)                  cancel()
  //failed due to context cancelled      tm.ctlr.Remove(ctx)
  //container still createdby daemon     //failed due to "No such
                                         //container" error because
                                         //the container not created
                                         //yet
  ...
  //the container created by docker
  //daemon but becomes orphan

This patch fixes it by delaying the cancel option until the ongoing
operation is finished if the task is in the preparing state, in
which state a container will be created for the task. This way,
the removal operation would never race with the creation operation.

Signed-off-by: Jin Xu jinuxstyle@hotmail.com

This commit fixes a race condition between task creation and removal which could causes orphan containers. It happens when a agent task is handling creation event and is creating a container, a removal event comes in and cancels the task's context, causing both the creation and removal fail, but the container is still created by the docker daemon. The removal fails because when the DELETE request arrives the docker daemon, the container is still being created. goroutine (creation) task (removal) --------- ---- ... ... ctlr.Prepare(ctx) r.adapter.createNetworks(ctx) ... case <-shutdown r.adapter.create(ctx) cancel() //failed due to context cancelled tm.ctlr.Remove(ctx) //container still createdby daemon //failed due to "No such //container" error because //the container not created //yet ... //the container created by docker //daemon but becomes orphan This patch fixes it by delaying the cancel option until the ongoing operation is finished if the task is in the preparing state, in which state a container will be created for the task. This way, the removal operation would never race with the creation operation. Signed-off-by: Jin Xu <jinuxstyle@hotmail.com>

codecov-io · 2016-07-11T10:26:59Z

Current coverage is 55.20%

Merging #1154 into master will decrease coverage by 0.08%

@@             master      #1154   diff @@
==========================================
  Files            77         77          
  Lines         12079      12083     +4   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           6678       6670     -8   
- Misses         4491       4501    +10   
- Partials        910        912     +2

Powered by Codecov. Last updated by 6478bc1...4abca88

LK4D4 · 2016-07-11T15:37:15Z

LGTM

abronan · 2016-07-11T16:42:30Z

~~LGTM~~

/cc @aluzzardi @stevvooe

stevvooe · 2016-07-11T17:43:27Z

+				// is in preparing state. This avoid the race
+				// between Prepare and Remove operations,
+				// which could cause orphan containers.
+				if tm.task.Status.State == api.TaskStatePreparing {


This method should never be looking at task state to make decisions about what to do. State is only inspected or mutated in exec.Do.

I agree it's better doing it in exec.Do so that it can go through the state machine. I thought about it but I didn't figure out how. It looks the shutdown is handled as an special event which can abort other ongoing activities for the same task.

stevvooe · 2016-07-11T17:52:02Z

Unfortunately, this fix will reintroduce bugs that this code prevents. Please do not merge this.

The actual bug is that Remove is returning an error when the container is already removed or never existed at all.

In general, no two methods on exec.Controller should ever be called at the same time. This makes concurrency in the controller implementation easier, but it also ensures that we get consistent information when returning from a method. In this case, we can't really wait for Prepare to exit, so we may have to impose some requirements on the implementers of exec.Controller.

We may have to make Remove concurrent and cancelling for this to work properly. Or, just read the comment I made above the call to Remove (you read it, right?) and handle this with a reaper.

jinuxstyle · 2016-07-11T22:43:17Z

Thanks for your review. @stevvooe

The actual bug is that Remove is returning an error when the container is already removed or never existed at all.

Based on my debugging, the Remove did fail and it's due to the latter reason as you pointed out. The docker daemon reported "No such container" error but it already received the creation request and was creating the container.

In general, no two methods on exec.Controller should ever be called at the same time. This makes concurrency in the controller implementation easier, but it also ensures that we get consistent information when returning from a method.

I agree. But currently, the shutdown is handled concurrently in a separate branch against other operations.

In this case, we can't really wait for Prepare to exit, so we may have to impose some requirements on the implementers of exec.Controller.

I couldn't understand why we can't wait for Prepare to exit except the preparing may takes some time if it's pulling a image. Could you explain what's your concern?

By the way, the #1152 proposed a solution which fixes the race in the exec.Controller level. Could you please take a look if you think that's a potentially better solution?

We may have to make Remove concurrent and cancelling for this to work properly.

As I observed, the Remove is called only once. Do you mean making Remove and Create operations concurrent safe?

Or, just read the comment I made above the call to Remove (you read it, right?) and handle this with a reaper.

I read the comment you mentioned and it looks like the reaper is a long term solution. But I don't know how to implement it. I would be interested in the reaper feature.

jinuxstyle · 2016-07-11T22:46:01Z

Unfortunately, this fix will reintroduce bugs that this code prevents.

BTW, I am curious about what bug do you mean which could be reintroduced by this change?

jinuxstyle · 2016-07-11T22:54:19Z

To debug the issue, I added some prints in Prepare and Remove functions. Following
logs are extracted out to see what happened for an orphan container.

swarmkit log:
[Prepare] Creating container frontend-13.1.6b8ctzc40cyvyp0ww15m5afam
[Remove] Removing container frontend-13.1.6b8ctzc40cyvyp0ww15m5afam
[Prepare] Failed to create container frontend-13.1.6b8ctzc40cyvyp0ww15m5afam with error
An error occurred trying to connect: context canceled

daemon log:
ERRO[2006] Handler for POST /containers/frontend-13.1.6b8ctzc40cyvyp0ww15m5afam
/stop returned error: No such container: frontend-13.1.6b8ctzc40cyvyp0ww15m5afam
ERRO[2006] Handler for DELETE /containers/frontend-13.1.6b8ctzc40cyvyp0ww15m5afam
returned error: No such container: frontend-13.1.6b8ctzc40cyvyp0ww15m5afam

stevvooe · 2016-07-12T00:05:46Z

I read the comment you mentioned and it looks like the reaper is a long term solution. But I don't know how to implement it. I would be interested in the reaper feature.

We aren't implementing this now because it will hide these kinds of bugs.

Could you explain what's your concern?

This fix is too narrow and blocks the manager on a potentially unbounded operation. What if the container is in any other method? The race condition still exists. This needs to be handled by the

Could you please take a look if you think that's a potentially better solution?

#1152 suffers from being too narrow, as well. And it adds state, which it doesn't need at all. exec.Controller is supposed to be stateless but well-ordered.

Careful removal of the "does not exist" error and granular locking (and maybe a closed channel) is all that is needed to resolve this.

Could you file a bug with the description of the problem? We need to be careful with a solution here.

jinuxstyle · 2016-07-12T00:42:07Z

This fix is too narrow and blocks the manager on a potentially unbounded operation. What if the container is in any other method? The race condition still exists.

Based on current design, the removal operation is disruptive and will race not only with Prepare but also Start and potentially other methods. I think the design itself is not bad because it can remove the task without waiting too long by cancelling ongoing activities. So following the current design, we need to make sure it doesn't bring harmful results. As I noticed so far, only the race between remove and prepare will result in annoying result, orphan containers. Of course, we would need a more general approach if we have larger range of problems.

Could you file a bug with the description of the problem? We need to be careful with a solution here.

No problem.

jinuxstyle · 2016-07-12T00:53:10Z

Filed issue #1159.

jinuxstyle · 2016-07-21T01:39:04Z

Superseded by #1192. Closing it.

GordonTheTurtle added the status/0-triage label Jul 11, 2016

jinuxstyle mentioned this pull request Jul 11, 2016

Fix race condition between task creation and removal #1152

Closed

abronan added status/2-code-review area/agent and removed status/0-triage labels Jul 11, 2016

stevvooe reviewed Jul 11, 2016
View reviewed changes

jinuxstyle mentioned this pull request Jul 12, 2016

Race condition between task creation and removal #1159

Open

jinuxstyle closed this Jul 21, 2016

Conversation

jinuxstyle commented Jul 11, 2016

Uh oh!

codecov-io commented Jul 11, 2016

Current coverage is 55.20%

Uh oh!

LK4D4 commented Jul 11, 2016

Uh oh!

abronan commented Jul 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevvooe Jul 11, 2016

Choose a reason for hiding this comment

Uh oh!

jinuxstyle Jul 11, 2016

Choose a reason for hiding this comment

Uh oh!

stevvooe commented Jul 11, 2016

Uh oh!

jinuxstyle commented Jul 11, 2016

Uh oh!

jinuxstyle commented Jul 11, 2016

Uh oh!

jinuxstyle commented Jul 11, 2016

Uh oh!

stevvooe commented Jul 12, 2016

Uh oh!

jinuxstyle commented Jul 12, 2016

Uh oh!

jinuxstyle commented Jul 12, 2016

Uh oh!

jinuxstyle commented Jul 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

abronan commented Jul 11, 2016 •

edited

Loading