Skip to content

agent: call remove after operation is finished#1192

Merged
aaronlehmann merged 1 commit into
moby:masterfrom
stevvooe:protect-remove
Jul 21, 2016
Merged

agent: call remove after operation is finished#1192
aaronlehmann merged 1 commit into
moby:masterfrom
stevvooe:protect-remove

Conversation

@stevvooe
Copy link
Copy Markdown
Contributor

To avoid calling Controller methods concurrently, Controller.Remove
is now called only after the last operation has proceeded. We do this
by checking to see if the cancellation is available to detect an
outstanding operation needs to be completed. Once the operation returns,
we proceed with the shutdown procedure.

Signed-off-by: Stephen J Day stephen.day@docker.com

cc @jinuxstyle

To avoid calling Controller methods concurrently, `Controller.Remove`
is now called only *after* the last operation has proceeded. We do this
by checking to see if the cancellation is available to detect an
outstanding operation needs to be completed. Once the operation returns,
we proceed with the shutdown procedure.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
@stevvooe
Copy link
Copy Markdown
Contributor Author

@jinuxstyle Please give this patch a try with your test script.

@jinuxstyle
Copy link
Copy Markdown
Contributor

OK, I will.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jul 20, 2016

Current coverage is 55.23%

Merging #1192 into master will increase coverage by 0.33%

@@             master      #1192   diff @@
==========================================
  Files            77         77          
  Lines         12193      12131    -62   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6694       6700     +6   
+ Misses         4579       4521    -58   
+ Partials        920        910    -10   

Sunburst

Powered by Codecov. Last updated by eacedad...559595b

Comment thread agent/task.go
@@ -203,6 +210,12 @@ func (tm *taskManager) run(ctx context.Context) {
if cancel != nil {
// cancel outstanding operation.
cancel()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the test with this patch applied and the problem of orphan containers was still there. The race illustrated in #1159 did not get fixed. The point is that the calling to cancel() is disruptive to the ongoing create operation. It causes the create operation return without waiting for the response from docker daemon because of context cancelled. Then the agent task proceeds to remove the container, while the docker daemon is creating it. When the remove request comes to the daemon before the container was created, the remove operation will fail and the container will become orphan after it's created.

In simple words, the problem could happen if the cancel() is called when the agent task is still creating the container.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So, it sounds like the bug is in the context handling. #1159 works because it completely disables the cancellation.

This is a race condition in the docker daemon. I'm not sure how to deal with this without doing a retry loop. Even then, if we don't have a guarantee about the state of the container after the Create method returns, we'd have to keep that container ID around until we can be absolutely sure.

We'll really need to fix this in the docker daemon, such that one can remove a container that is in the process of being created.

I've filed moby/moby#24858.

This PR should make it so Remove and container.create aren't issued concurrently, so it is still needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinuxstyle We are going to merge this PR. This should address this issue when encountering in the docker daemon integration. There is still a race condition that needs to be tackled as part of moby/moby#24858 but that is going to be a more complex PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good and it will likely make the task operations more robust. So +1.

I am glad to see you also raised an issue to docker daemon. I agree that it would be better if docker could provide a way to cancel ongoing container operations like create.

@tonistiigi
Copy link
Copy Markdown
Member

LGTM

1 similar comment
@aaronlehmann
Copy link
Copy Markdown
Collaborator

LGTM

@aaronlehmann aaronlehmann merged commit 38857c0 into moby:master Jul 21, 2016
@stevvooe stevvooe deleted the protect-remove branch July 21, 2016 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants