Fix task reaper not cleaning up COMPLETE tasks by dperny · Pull Request #2473 · moby/swarmkit

dperny · 2017-12-18T17:43:12Z

Tasks in the complete state were not cleaned up because the task reaper was only looking at task.Status.State >= api.TaskStateShutdown, but the COMPLETE state (api.TaskStateCompleted) is the actual "first" terminal state.

/cc @anshulpundir

dperny · 2017-12-18T17:43:47Z

This change will need to be backported to wherever we've already backported the REMOVE state.

codecov · 2017-12-18T17:54:40Z

Codecov Report

Merging #2473 into master will decrease coverage by 5.67%.
The diff coverage is 50%.

@@            Coverage Diff            @@
##           master   #2473      +/-   ##
=========================================
- Coverage   67.28%   61.6%   -5.68%     
=========================================
  Files          80     129      +49     
  Lines       11498   21243    +9745     
=========================================
+ Hits         7736   13086    +5350     
- Misses       2970    6757    +3787     
- Partials      792    1400     +608

Tasks in the complete state were not cleaned up because the task reaper was only looking at task.Status.State >= api.TaskStateShutdown, but the COMPLETE state (api.TaskStateCompleted) is the actual "first" terminal state. Added a test for the initialization stage of the task repear. This runs the logic of which tasks to clean up through its paces, but doesn't test the run-time logic of the task reaper (it's ability to respond to updates to tasks). Signed-off-by: Drew Erny <drew.erny@docker.com>

dperny · 2017-12-18T19:15:26Z

Added a new test, covering the initialization stage of the task reaper.

Tasks in the complete state were not cleaned up because the task reaper was only looking at task.Status.State >= api.TaskStateShutdown, but the COMPLETE state (api.TaskStateCompleted) is the actual "first" terminal state. Added a test for the initialization stage of the task repear. This runs the logic of which tasks to clean up through its paces, but doesn't test the run-time logic of the task reaper (it's ability to respond to updates to tasks). Cherry-pick of moby#2473 Cherry pick does not apply cleanly. In addition, cherry pick requires changes to `manager/orchestrator/taskreaper/task_reaper_test.go`, because of a difference in the API of the task reaper. (cherry picked from commit e5b3107) Signed-off-by: Drew Erny <drew.erny@docker.com>

stevvooe · 2017-12-18T22:03:05Z


 // Stop stops the TaskReaper and waits for the main loop to exit.
 func (tr *TaskReaper) Stop() {
+	// TODO(dperny) calling stop on the task reaper twice will cause a panic


'

This might be an easy fix:

tr.stopOnce.Do(func() { close(tr.stopChan) })

yeah i know the easy fix, i just didn't want to "fix" it in an unrelated PR.

stevvooe · 2017-12-18T22:03:37Z

LGTM

anshulpundir · 2017-12-19T01:29:18Z

 				// tasks are associated with slots that were removed as part of a service scale down
 				// or service removal.
-				if t.DesiredState == api.TaskStateRemove && t.Status.State >= api.TaskStateShutdown {
+				if t.DesiredState == api.TaskStateRemove && t.Status.State >= api.TaskStateCompleted {


What about the tasks which reached completed state before we removed/downsized the service ?

They'll be handled by the regular task retention logic, won't they?

I realized that the reaper will still see update events for those tasks when the desired state is updated to REMOVE.

anshulpundir · 2017-12-19T01:37:32Z

+	"github.com/docker/swarmkit/manager/state/store"
+)
+
+// TestTaskReaperInit tests that the task reaper correctly cleans up tasks when


This bug was for the service removal case. We should update that test to catch the bug.

@anshulpundir does #2476 addresses that? I mean that PR should only go green when this PR is merged?

I replied back to @dperny in a separate message. This fix is correct and I realized that as I was adding the unit test in #2476 @marcusmartins

anshulpundir · 2017-12-19T14:06:03Z

+
+// TestTaskReaperInit tests that the task reaper correctly cleans up tasks when
+// it is initialized. This will happen every time cluster leadership changes.
+func TestTaskReaperInit(t *testing.T) {


Actually can you please update the other two tests (TestTaskStateRemoveOnServiceRemoval, TestTaskStateRemoveOnScaledown) to test for TaskStateCompleted ? @dperny

stevvooe · 2017-12-19T18:06:50Z

@anshulpundir Let's update the tests in a follow up!

Revendor swarmkit to 713d79d to include moby/swarmkit#2473. Signed-off-by: Marcus Martins <marcus@docker.com>

Revendor swarmkit to 713d79dc8799b33465c58ed120b870c52eb5eb4f to include moby/swarmkit#2473. Signed-off-by: Marcus Martins <marcus@docker.com> Upstream-commit: af73d31 Component: engine

chanwit · 2017-12-23T15:16:13Z

I'm just curious what if the Desired State == REMOVE and the Current State == RUNNING ?

chanwit · 2017-12-23T17:51:19Z

Never mind me. The tasks get into that state only during the engine upgrading process.

aluzzardi · 2018-03-13T23:29:28Z

 		// right away
 		for _, t := range removeTasks {
-			if t.Status.State >= api.TaskStateShutdown {
+			if t.Status.State >= api.TaskStateCompleted {


nit: Why is this not > api.TaskStateRunning?

That's pretty much the norm throughout the codebase to see whether a Task is running or not:

$ ack '> api.TaskStateRunning' cmd/swarmctl/task/print.go:48: if !all && t.DesiredState > api.TaskStateRunning { cmd/swarmctl/task/inspect.go:40: if t.Status.State > api.TaskStateRunning { manager/dispatcher/dispatcher_test.go:589: return s > api.TaskStateRunning manager/dispatcher/assignments.go:229: if t.Status.State > api.TaskStateRunning { manager/scheduler/nodeinfo.go:111: if t.DesiredState <= api.TaskStateRunning && oldTask.DesiredState > api.TaskStateRunning { manager/scheduler/nodeinfo.go:116: } else if t.DesiredState > api.TaskStateRunning && oldTask.DesiredState <= api.TaskStateRunning { manager/scheduler/scheduler.go:73: if t.Status.State < api.TaskStatePending || t.Status.State > api.TaskStateRunning { manager/scheduler/scheduler.go:208: if t.Status.State < api.TaskStatePending || t.Status.State > api.TaskStateRunning { manager/scheduler/scheduler.go:245: if t.Status.State > api.TaskStateRunning { manager/allocator/network.go:627: if t.Status.State > api.TaskStateRunning { manager/allocator/network.go:794: if t.Status.State > api.TaskStateRunning || isDelete { manager/orchestrator/replicated/tasks.go:54: if t.DesiredState > api.TaskStateRunning { manager/orchestrator/replicated/tasks.go:95: if t.DesiredState > api.TaskStateRunning { manager/orchestrator/replicated/tasks.go:121: if t.DesiredState > api.TaskStateRunning { manager/orchestrator/replicated/tasks.go:142: if t.Status.State > api.TaskStateRunning || manager/orchestrator/replicated/tasks.go:153: if t.DesiredState > api.TaskStateRunning { manager/orchestrator/replicated/tasks.go:175: if t.Status.State > api.TaskStateRunning || manager/orchestrator/replicated/update_test.go:78: } else if task.DesiredState > api.TaskStateRunning { manager/orchestrator/constraintenforcer/constraint_enforcer.go:104: if t.DesiredState < api.TaskStateAssigned || t.DesiredState > api.TaskStateRunning { manager/orchestrator/constraintenforcer/constraint_enforcer.go:150: if t == nil || t.DesiredState > api.TaskStateRunning { manager/orchestrator/taskinit/init.go:62: if t.DesiredState != api.TaskStateReady || t.Status.State > api.TaskStateRunning { manager/orchestrator/update/updater.go:496: if original.DesiredState > api.TaskStateRunning { manager/orchestrator/update/updater.go:504: if t.DesiredState > api.TaskStateRunning { manager/orchestrator/update/updater_test.go:386: } else if task.DesiredState > api.TaskStateRunning { manager/orchestrator/restart/restart.go:86: if t.DesiredState > api.TaskStateRunning { manager/orchestrator/restart/restart.go:124: if t.DesiredState > api.TaskStateRunning { manager/orchestrator/restart/restart.go:173: if (n != nil && n.Status.State == api.NodeStatus_DOWN) || t.Status.State > api.TaskStateRunning { manager/orchestrator/global/global.go:181: if t.DesiredState > api.TaskStateRunning { manager/orchestrator/global/global.go:196: if t.Status.State > api.TaskStateRunning { manager/orchestrator/global/global.go:208: if t.DesiredState > api.TaskStateRunning { manager/orchestrator/global/global.go:213: if t.Status.State > api.TaskStateRunning { manager/orchestrator/global/global.go:496: if t == nil || t.DesiredState > api.TaskStateRunning {

dperny force-pushed the fix-not-cleaning-up-complete-tasks branch from 6267bbf to e5b3107 Compare December 18, 2017 19:15

wsong approved these changes Dec 18, 2017

View reviewed changes

dperny mentioned this pull request Dec 18, 2017

[v17.06.5] Cherry-pick "Fix task reaper not cleaning up COMPLETE tasks" #2475

Merged

stevvooe reviewed Dec 18, 2017

View reviewed changes

anshulpundir reviewed Dec 19, 2017

View reviewed changes

anshulpundir approved these changes Dec 19, 2017

View reviewed changes

anshulpundir reviewed Dec 19, 2017

View reviewed changes

seemethere mentioned this pull request Dec 19, 2017

[17.12] Fix task reaper not cleaning up COMPLETE tasks #2477

Merged

stevvooe merged commit 713d79d into moby:master Dec 19, 2017

marcusmartins added a commit to marcusmartins/docker that referenced this pull request Dec 19, 2017

Vendor docker/swarmkit to 713d79d

af73d31

Revendor swarmkit to 713d79d to include moby/swarmkit#2473. Signed-off-by: Marcus Martins <marcus@docker.com>

marcusmartins mentioned this pull request Dec 19, 2017

Vendor docker/swarmkit to 713d79d moby/moby#35838

Merged

aluzzardi reviewed Mar 13, 2018

View reviewed changes

nishanttotla mentioned this pull request May 4, 2018

[bump_v17.06] Adding Task State Remove and other fixes #2623

Merged

Conversation

dperny commented Dec 18, 2017

Uh oh!

dperny commented Dec 18, 2017

Uh oh!

codecov Bot commented Dec 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dperny commented Dec 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevvooe commented Dec 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anshulpundir Dec 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevvooe commented Dec 19, 2017

Uh oh!

chanwit commented Dec 23, 2017

Uh oh!

chanwit commented Dec 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov Bot commented Dec 18, 2017 •

edited

Loading

anshulpundir Dec 19, 2017 •

edited

Loading

chanwit commented Dec 23, 2017 •

edited

Loading