-
Notifications
You must be signed in to change notification settings - Fork 656
agent: call remove after operation is finished #1192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the test with this patch applied and the problem of orphan containers was still there. The race illustrated in #1159 did not get fixed. The point is that the calling to cancel() is disruptive to the ongoing create operation. It causes the create operation return without waiting for the response from docker daemon because of context cancelled. Then the agent task proceeds to remove the container, while the docker daemon is creating it. When the remove request comes to the daemon before the container was created, the remove operation will fail and the container will become orphan after it's created.
In simple words, the problem could happen if the cancel() is called when the agent task is still creating the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So, it sounds like the bug is in the context handling. #1159 works because it completely disables the cancellation.
This is a race condition in the docker daemon. I'm not sure how to deal with this without doing a retry loop. Even then, if we don't have a guarantee about the state of the container after the
Createmethod returns, we'd have to keep that container ID around until we can be absolutely sure.We'll really need to fix this in the docker daemon, such that one can remove a container that is in the process of being created.
I've filed moby/moby#24858.
This PR should make it so
Removeandcontainer.createaren't issued concurrently, so it is still needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinuxstyle We are going to merge this PR. This should address this issue when encountering in the docker daemon integration. There is still a race condition that needs to be tackled as part of moby/moby#24858 but that is going to be a more complex PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good and it will likely make the task operations more robust. So +1.
I am glad to see you also raised an issue to docker daemon. I agree that it would be better if docker could provide a way to cancel ongoing container operations like create.