Skip to content

Task reaper should delete tasks with removed slots that were not yet assigned#2557

Merged
cyli merged 1 commit into
moby:masterfrom
nishanttotla:delete-tasks-that-havent-reached-agent
Apr 9, 2018
Merged

Task reaper should delete tasks with removed slots that were not yet assigned#2557
cyli merged 1 commit into
moby:masterfrom
nishanttotla:delete-tasks-that-havent-reached-agent

Conversation

@nishanttotla
Copy link
Copy Markdown
Contributor

@nishanttotla nishanttotla commented Mar 14, 2018

This PR addresses #2555.

cc @docker/core-swarmkit-maintainers

Signed-off-by: Nishant Totla nishanttotla@gmail.com

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 14, 2018

Codecov Report

Merging #2557 into master will increase coverage by 0.45%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2557      +/-   ##
==========================================
+ Coverage   61.74%   62.19%   +0.45%     
==========================================
  Files          51      134      +83     
  Lines        7112    21761   +14649     
==========================================
+ Hits         4391    13535    +9144     
- Misses       2264     6783    +4519     
- Partials      457     1443     +986

@stevvooe
Copy link
Copy Markdown
Contributor

LGTM

Copy link
Copy Markdown
Contributor

@anshulpundir anshulpundir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add unit-test for this ?

Also, I think this will introduce races where the task might get assigned to an agent as its being removed by the reaper.

// haven't been assigned yet can be cleaned up right away
for _, t := range removeTasks {
if t.Status.State >= api.TaskStateCompleted {
if t.Status.State >= api.TaskStateCompleted || t.Status.State < api.TaskStateAssigned {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd suggest making it:

t.Status.State < api.TaskStateAssigned || t.Status.State >= api.TaskStateCompleted

slightly better for reading

// add tasks that are yet unassigned or have progressed beyond COMPLETE, with
// desired state REMOVE. These tasks are associated with slots that were removed
// as part of a service scale down or service removal.
if t.DesiredState == api.TaskStateRemove && (t.Status.State >= api.TaskStateCompleted || t.Status.State < api.TaskStateAssigned) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

@nishanttotla
Copy link
Copy Markdown
Contributor Author

nishanttotla commented Mar 14, 2018

@anshulpundir good point about races. Unless the agent doesn't try to run tasks that have desired state REMOVE, the race will exist.

As of now, I don't think the agent does this check cc @stevvooe

// add tasks that are yet unassigned or have progressed beyond COMPLETE, with
// desired state REMOVE. These tasks are associated with slots that were removed
// as part of a service scale down or service removal.
if t.DesiredState == api.TaskStateRemove && (t.Status.State >= api.TaskStateCompleted || t.Status.State < api.TaskStateAssigned) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we expect to get task update events for tasks which are marked REMOVE before they are assigned to the agent ? If a task is marked REMOVE, does the orchestrator even assign it to an agent ? @nishanttotla

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @anshulpundir. It would be fair to expect the orchestrator to not assign those tasks at all. Let me check this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anshulpundir so it seems like the dispatcher never checks desired state, only current state to decide to send down tasks: https://github.com/docker/swarmkit/blob/master/manager/dispatcher/dispatcher.go#L819

Does it make sense to check desired state here as well? Why should the dispatcher send a task to an agent if its desired state is past SHUTDOWN?

cc @stevvooe @cyli

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From discussion IRL with @nishanttotla: It looks like the controller on the agent is responsible for looking at the desired state and acting on it (either by shutting down or not running at all). The dispatcher sends down all the tasks that are assigned to the agent, and any changes to those tasks. So if a task changes from DesiredState = RUNNING to DesiredState = SHUTDOWN it needs to send that update to the agent so that on the agent side the task gets shut down. If the dispatcher skips sending the update, the task will just keep running on the agent.

@nishanttotla nishanttotla force-pushed the delete-tasks-that-havent-reached-agent branch 3 times, most recently from 09f9f94 to c61b8ed Compare April 4, 2018 21:10
@nishanttotla nishanttotla force-pushed the delete-tasks-that-havent-reached-agent branch from c61b8ed to 1df0bf7 Compare April 4, 2018 21:48
@nishanttotla
Copy link
Copy Markdown
Contributor Author

Rebased on #2574 and added a test case. cc @cyli @anshulpundir @dperny

Slot: 3,
DesiredState: api.TaskStateRemove,
Status: api.TaskStatus{
State: api.TaskStatePending,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a test for TaskStateNew

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Regarding the comments below, I'll also rebase and add some more tests for the case of a running task reaper.

@anshulpundir
Copy link
Copy Markdown
Contributor

anshulpundir commented Apr 4, 2018

LGTM after updating the tests for NEW state.

@anshulpundir anshulpundir requested a review from dperny April 4, 2018 23:38
Copy link
Copy Markdown
Contributor

@cyli cyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can we add a test for task reaper reaping REMOVEd tasks that are unassigned while the task reaper is running? (e.g. not on task reaper init).

@anshulpundir
Copy link
Copy Markdown
Contributor

anshulpundir commented Apr 5, 2018 via email

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Apr 5, 2018

@anshulpundir I'm wondering if it'd be worth creating an exhaustive list of tasks in every combination of desired state + current state, and making sure the correct ones are reaped on start, and doing the same to create/update.

@nishanttotla
Copy link
Copy Markdown
Contributor Author

@cyli exhaustive list can't hurt. It'll help systematize task reaper tests, and possibly catch bugs faster. I can take on that task in a separate PR.

…assigned

Signed-off-by: Nishant Totla <nishanttotla@gmail.com>
@nishanttotla nishanttotla force-pushed the delete-tasks-that-havent-reached-agent branch from 1df0bf7 to 7f63825 Compare April 9, 2018 20:34
@cyli cyli merged commit ab856c7 into moby:master Apr 9, 2018
@nishanttotla nishanttotla deleted the delete-tasks-that-havent-reached-agent branch April 9, 2018 22:21
thaJeztah added a commit to thaJeztah/docker that referenced this pull request Apr 17, 2018
Relevant changes:

- moby/swarmkit#2551 RoleManager will remove deleted nodes from the cluster membership
- moby/swarmkit#2574 Scheduler/TaskReaper: handle unassigned tasks marked for shutdown
- moby/swarmkit#2561 Avoid predefined error log
- moby/swarmkit#2557 Task reaper should delete tasks with removed slots that were not yet assigned
- moby/swarmkit#2587 [fips] Agent reports FIPS status
- moby/swarmkit#2603 Fix manager/state/store.timedMutex

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Apr 18, 2018
Relevant changes:

- moby/swarmkit#2551 RoleManager will remove deleted nodes from the cluster membership
- moby/swarmkit#2574 Scheduler/TaskReaper: handle unassigned tasks marked for shutdown
- moby/swarmkit#2561 Avoid predefined error log
- moby/swarmkit#2557 Task reaper should delete tasks with removed slots that were not yet assigned
- moby/swarmkit#2587 [fips] Agent reports FIPS status
- moby/swarmkit#2603 Fix manager/state/store.timedMutex

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 333b2f28fef4ba857905e7263e7b9bbbf7c522fc
Component: engine
thaJeztah added a commit to thaJeztah/docker-ce that referenced this pull request Apr 19, 2018
Relevant changes:

- moby/swarmkit#2551 RoleManager will remove deleted nodes from the cluster membership
- moby/swarmkit#2574 Scheduler/TaskReaper: handle unassigned tasks marked for shutdown
- moby/swarmkit#2561 Avoid predefined error log
- moby/swarmkit#2557 Task reaper should delete tasks with removed slots that were not yet assigned
- moby/swarmkit#2587 [fips] Agent reports FIPS status
- moby/swarmkit#2603 Fix manager/state/store.timedMutex

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 333b2f28fef4ba857905e7263e7b9bbbf7c522fc)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants