Fix task reaper not cleaning up COMPLETE tasks#2473
Conversation
|
This change will need to be backported to wherever we've already backported the REMOVE state. |
Codecov Report
@@ Coverage Diff @@
## master #2473 +/- ##
=========================================
- Coverage 67.28% 61.6% -5.68%
=========================================
Files 80 129 +49
Lines 11498 21243 +9745
=========================================
+ Hits 7736 13086 +5350
- Misses 2970 6757 +3787
- Partials 792 1400 +608 |
Tasks in the complete state were not cleaned up because the task reaper was only looking at task.Status.State >= api.TaskStateShutdown, but the COMPLETE state (api.TaskStateCompleted) is the actual "first" terminal state. Added a test for the initialization stage of the task repear. This runs the logic of which tasks to clean up through its paces, but doesn't test the run-time logic of the task reaper (it's ability to respond to updates to tasks). Signed-off-by: Drew Erny <drew.erny@docker.com>
6267bbf to
e5b3107
Compare
|
Added a new test, covering the initialization stage of the task reaper. |
Tasks in the complete state were not cleaned up because the task reaper was only looking at task.Status.State >= api.TaskStateShutdown, but the COMPLETE state (api.TaskStateCompleted) is the actual "first" terminal state. Added a test for the initialization stage of the task repear. This runs the logic of which tasks to clean up through its paces, but doesn't test the run-time logic of the task reaper (it's ability to respond to updates to tasks). Cherry-pick of moby#2473 Cherry pick does not apply cleanly. In addition, cherry pick requires changes to `manager/orchestrator/taskreaper/task_reaper_test.go`, because of a difference in the API of the task reaper. (cherry picked from commit e5b3107) Signed-off-by: Drew Erny <drew.erny@docker.com>
|
|
||
| // Stop stops the TaskReaper and waits for the main loop to exit. | ||
| func (tr *TaskReaper) Stop() { | ||
| // TODO(dperny) calling stop on the task reaper twice will cause a panic |
There was a problem hiding this comment.
'
This might be an easy fix:
tr.stopOnce.Do(func() {
close(tr.stopChan)
})
There was a problem hiding this comment.
yeah i know the easy fix, i just didn't want to "fix" it in an unrelated PR.
|
LGTM |
| // tasks are associated with slots that were removed as part of a service scale down | ||
| // or service removal. | ||
| if t.DesiredState == api.TaskStateRemove && t.Status.State >= api.TaskStateShutdown { | ||
| if t.DesiredState == api.TaskStateRemove && t.Status.State >= api.TaskStateCompleted { |
There was a problem hiding this comment.
What about the tasks which reached completed state before we removed/downsized the service ?
There was a problem hiding this comment.
They'll be handled by the regular task retention logic, won't they?
There was a problem hiding this comment.
I realized that the reaper will still see update events for those tasks when the desired state is updated to REMOVE.
| "github.com/docker/swarmkit/manager/state/store" | ||
| ) | ||
|
|
||
| // TestTaskReaperInit tests that the task reaper correctly cleans up tasks when |
There was a problem hiding this comment.
This bug was for the service removal case. We should update that test to catch the bug.
There was a problem hiding this comment.
@anshulpundir does #2476 addresses that? I mean that PR should only go green when this PR is merged?
There was a problem hiding this comment.
I replied back to @dperny in a separate message. This fix is correct and I realized that as I was adding the unit test in #2476 @marcusmartins
|
|
||
| // TestTaskReaperInit tests that the task reaper correctly cleans up tasks when | ||
| // it is initialized. This will happen every time cluster leadership changes. | ||
| func TestTaskReaperInit(t *testing.T) { |
There was a problem hiding this comment.
Actually can you please update the other two tests (TestTaskStateRemoveOnServiceRemoval, TestTaskStateRemoveOnScaledown) to test for TaskStateCompleted ? @dperny
|
@anshulpundir Let's update the tests in a follow up! |
Revendor swarmkit to 713d79d to include moby/swarmkit#2473. Signed-off-by: Marcus Martins <marcus@docker.com>
Revendor swarmkit to 713d79dc8799b33465c58ed120b870c52eb5eb4f to include moby/swarmkit#2473. Signed-off-by: Marcus Martins <marcus@docker.com> Upstream-commit: af73d31 Component: engine
|
I'm just curious what if the Desired State == REMOVE and the Current State == RUNNING ? |
|
Never mind me. The tasks get into that state only during the engine upgrading process. |
| // right away | ||
| for _, t := range removeTasks { | ||
| if t.Status.State >= api.TaskStateShutdown { | ||
| if t.Status.State >= api.TaskStateCompleted { |
There was a problem hiding this comment.
nit: Why is this not > api.TaskStateRunning?
That's pretty much the norm throughout the codebase to see whether a Task is running or not:
$ ack '> api.TaskStateRunning'
cmd/swarmctl/task/print.go:48: if !all && t.DesiredState > api.TaskStateRunning {
cmd/swarmctl/task/inspect.go:40: if t.Status.State > api.TaskStateRunning {
manager/dispatcher/dispatcher_test.go:589: return s > api.TaskStateRunning
manager/dispatcher/assignments.go:229: if t.Status.State > api.TaskStateRunning {
manager/scheduler/nodeinfo.go:111: if t.DesiredState <= api.TaskStateRunning && oldTask.DesiredState > api.TaskStateRunning {
manager/scheduler/nodeinfo.go:116: } else if t.DesiredState > api.TaskStateRunning && oldTask.DesiredState <= api.TaskStateRunning {
manager/scheduler/scheduler.go:73: if t.Status.State < api.TaskStatePending || t.Status.State > api.TaskStateRunning {
manager/scheduler/scheduler.go:208: if t.Status.State < api.TaskStatePending || t.Status.State > api.TaskStateRunning {
manager/scheduler/scheduler.go:245: if t.Status.State > api.TaskStateRunning {
manager/allocator/network.go:627: if t.Status.State > api.TaskStateRunning {
manager/allocator/network.go:794: if t.Status.State > api.TaskStateRunning || isDelete {
manager/orchestrator/replicated/tasks.go:54: if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/replicated/tasks.go:95: if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/replicated/tasks.go:121: if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/replicated/tasks.go:142: if t.Status.State > api.TaskStateRunning ||
manager/orchestrator/replicated/tasks.go:153: if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/replicated/tasks.go:175: if t.Status.State > api.TaskStateRunning ||
manager/orchestrator/replicated/update_test.go:78: } else if task.DesiredState > api.TaskStateRunning {
manager/orchestrator/constraintenforcer/constraint_enforcer.go:104: if t.DesiredState < api.TaskStateAssigned || t.DesiredState > api.TaskStateRunning {
manager/orchestrator/constraintenforcer/constraint_enforcer.go:150: if t == nil || t.DesiredState > api.TaskStateRunning {
manager/orchestrator/taskinit/init.go:62: if t.DesiredState != api.TaskStateReady || t.Status.State > api.TaskStateRunning {
manager/orchestrator/update/updater.go:496: if original.DesiredState > api.TaskStateRunning {
manager/orchestrator/update/updater.go:504: if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/update/updater_test.go:386: } else if task.DesiredState > api.TaskStateRunning {
manager/orchestrator/restart/restart.go:86: if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/restart/restart.go:124: if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/restart/restart.go:173: if (n != nil && n.Status.State == api.NodeStatus_DOWN) || t.Status.State > api.TaskStateRunning {
manager/orchestrator/global/global.go:181: if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/global/global.go:196: if t.Status.State > api.TaskStateRunning {
manager/orchestrator/global/global.go:208: if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/global/global.go:213: if t.Status.State > api.TaskStateRunning {
manager/orchestrator/global/global.go:496: if t == nil || t.DesiredState > api.TaskStateRunning {
Tasks in the complete state were not cleaned up because the task reaper was only looking at task.Status.State >= api.TaskStateShutdown, but the COMPLETE state (api.TaskStateCompleted) is the actual "first" terminal state.
/cc @anshulpundir