[WIP][17.06 backport] Fix leaking task resources when nodes are deleted#2840
Closed
thaJeztah wants to merge 2 commits into
Closed
[WIP][17.06 backport] Fix leaking task resources when nodes are deleted#2840thaJeztah wants to merge 2 commits into
thaJeztah wants to merge 2 commits into
Conversation
Member
Author
|
ping @dperny PTAL |
Member
Author
|
Why is there no CI running on the 17.06 branch? |
Member
Author
|
Making if WIP, because I suspect this would have the same problem as #2841 (comment) |
Collaborator
|
17.06 probably never got updated to the newest circle CI version. |
61e2c6e to
2f198de
Compare
Member
Author
|
Rebased; this will likely fail now (per my above comment), but at least will show that CI is doing its job! 😂 |
When a node is deleted, its tasks are asked to restart, which involves putting them into a desired state of Shutdown. However, the Allocator will not deallocate a task which is not in an actual state of a terminal state. Once a node is deleted, the only opportunity for its tasks to recieve updates and be moved to a terminal state is when the function moving those tasks to TaskStateOrphaned is called, 24 hours after the node enters the Down state. However, if a leadership change occurs, then that function will never be called, and the tasks will never be moved to a terminal state, leaking resources. With this change, upon node deletion, all of its tasks will be moved to TaskStateOrphaned, allowing those tasks' resources to be cleaned up. Additionally, as part of this backport, avoid using the gogo types.TimestampNow function, which does not exist in the vendored version. Signed-off-by: Drew Erny <drew.erny@docker.com> (cherry picked from commit 8467e6a) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2f198de to
c5e7960
Compare
When a node is removed, its tasks are set in state ORPHANED. This does not need to be done for tasks that are already in a terminal state, and if all tasks in all states are updated, the size of the transaction may grow too large to process, and node removal becomes impossible. This changes to only set non-terminal tasks to state ORPHANED, and terminal tasks are left alone. Cherry pick does not apply cleanly Signed-off-by: Drew Erny <drew.erny@docker.com> (cherry picked from commit d5df265) Signed-off-by: Drew Erny <drew.erny@docker.com>
Member
Author
|
Two linting failures remaining; |
Member
Author
|
closing, as 17.06 is EOL |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
backport of #2806 for the bump_v17.06 branch
When a node is deleted, its tasks are asked to restart, which involves
putting them into a desired state of Shutdown. However, the Allocator
will not deallocate a task which is not in an actual state of a terminal
state. Once a node is deleted, the only opportunity for its tasks to
recieve updates and be moved to a terminal state is when the function
moving those tasks to TaskStateOrphaned is called, 24 hours after the
node enters the Down state. However, if a leadership change occurs, then
that function will never be called, and the tasks will never be moved to
a terminal state, leaking resources.
With this change, upon node deletion, all of its tasks will be moved to
TaskStateOrphaned, allowing those tasks' resources to be cleaned up.