Skip to content

Fix task reaper not cleaning up COMPLETE tasks#2473

Merged
stevvooe merged 1 commit into
moby:masterfrom
dperny:fix-not-cleaning-up-complete-tasks
Dec 19, 2017
Merged

Fix task reaper not cleaning up COMPLETE tasks#2473
stevvooe merged 1 commit into
moby:masterfrom
dperny:fix-not-cleaning-up-complete-tasks

Conversation

@dperny
Copy link
Copy Markdown
Collaborator

@dperny dperny commented Dec 18, 2017

Tasks in the complete state were not cleaned up because the task reaper was only looking at task.Status.State >= api.TaskStateShutdown, but the COMPLETE state (api.TaskStateCompleted) is the actual "first" terminal state.

/cc @anshulpundir

@dperny
Copy link
Copy Markdown
Collaborator Author

dperny commented Dec 18, 2017

This change will need to be backported to wherever we've already backported the REMOVE state.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 18, 2017

Codecov Report

Merging #2473 into master will decrease coverage by 5.67%.
The diff coverage is 50%.

@@            Coverage Diff            @@
##           master   #2473      +/-   ##
=========================================
- Coverage   67.28%   61.6%   -5.68%     
=========================================
  Files          80     129      +49     
  Lines       11498   21243    +9745     
=========================================
+ Hits         7736   13086    +5350     
- Misses       2970    6757    +3787     
- Partials      792    1400     +608

Tasks in the complete state were not cleaned up because the task reaper
was only looking at task.Status.State >= api.TaskStateShutdown, but the
COMPLETE state (api.TaskStateCompleted) is the actual "first" terminal
state.

Added a test for the initialization stage of the task repear. This runs
the logic of which tasks to clean up through its paces, but doesn't test
the run-time logic of the task reaper (it's ability to respond to
updates to tasks).

Signed-off-by: Drew Erny <drew.erny@docker.com>
@dperny dperny force-pushed the fix-not-cleaning-up-complete-tasks branch from 6267bbf to e5b3107 Compare December 18, 2017 19:15
@dperny
Copy link
Copy Markdown
Collaborator Author

dperny commented Dec 18, 2017

Added a new test, covering the initialization stage of the task reaper.

dperny added a commit to dperny/swarmkit-1 that referenced this pull request Dec 18, 2017
Tasks in the complete state were not cleaned up because the task reaper
was only looking at task.Status.State >= api.TaskStateShutdown, but the
COMPLETE state (api.TaskStateCompleted) is the actual "first" terminal
state.

Added a test for the initialization stage of the task repear. This runs
the logic of which tasks to clean up through its paces, but doesn't test
the run-time logic of the task reaper (it's ability to respond to
updates to tasks).

Cherry-pick of moby#2473

Cherry pick does not apply cleanly. In addition, cherry pick requires
changes to `manager/orchestrator/taskreaper/task_reaper_test.go`,
because of a difference in the API of the task reaper.

(cherry picked from commit e5b3107)
Signed-off-by: Drew Erny <drew.erny@docker.com>

// Stop stops the TaskReaper and waits for the main loop to exit.
func (tr *TaskReaper) Stop() {
// TODO(dperny) calling stop on the task reaper twice will cause a panic
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'

This might be an easy fix:

tr.stopOnce.Do(func() {
 close(tr.stopChan)
})

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i know the easy fix, i just didn't want to "fix" it in an unrelated PR.

@stevvooe
Copy link
Copy Markdown
Contributor

LGTM

// tasks are associated with slots that were removed as part of a service scale down
// or service removal.
if t.DesiredState == api.TaskStateRemove && t.Status.State >= api.TaskStateShutdown {
if t.DesiredState == api.TaskStateRemove && t.Status.State >= api.TaskStateCompleted {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the tasks which reached completed state before we removed/downsized the service ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They'll be handled by the regular task retention logic, won't they?

Copy link
Copy Markdown
Contributor

@anshulpundir anshulpundir Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that the reaper will still see update events for those tasks when the desired state is updated to REMOVE.

"github.com/docker/swarmkit/manager/state/store"
)

// TestTaskReaperInit tests that the task reaper correctly cleans up tasks when
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug was for the service removal case. We should update that test to catch the bug.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anshulpundir does #2476 addresses that? I mean that PR should only go green when this PR is merged?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied back to @dperny in a separate message. This fix is correct and I realized that as I was adding the unit test in #2476 @marcusmartins


// TestTaskReaperInit tests that the task reaper correctly cleans up tasks when
// it is initialized. This will happen every time cluster leadership changes.
func TestTaskReaperInit(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can you please update the other two tests (TestTaskStateRemoveOnServiceRemoval, TestTaskStateRemoveOnScaledown) to test for TaskStateCompleted ? @dperny

@stevvooe
Copy link
Copy Markdown
Contributor

@anshulpundir Let's update the tests in a follow up!

@stevvooe stevvooe merged commit 713d79d into moby:master Dec 19, 2017
marcusmartins added a commit to marcusmartins/docker that referenced this pull request Dec 19, 2017
Revendor swarmkit to 713d79d to include
moby/swarmkit#2473.

Signed-off-by: Marcus Martins <marcus@docker.com>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Dec 21, 2017
Revendor swarmkit to 713d79dc8799b33465c58ed120b870c52eb5eb4f to include
moby/swarmkit#2473.

Signed-off-by: Marcus Martins <marcus@docker.com>
Upstream-commit: af73d31
Component: engine
@chanwit
Copy link
Copy Markdown

chanwit commented Dec 23, 2017

I'm just curious what if the Desired State == REMOVE and the Current State == RUNNING ?

@chanwit
Copy link
Copy Markdown

chanwit commented Dec 23, 2017

Never mind me. The tasks get into that state only during the engine upgrading process.

// right away
for _, t := range removeTasks {
if t.Status.State >= api.TaskStateShutdown {
if t.Status.State >= api.TaskStateCompleted {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why is this not > api.TaskStateRunning?

That's pretty much the norm throughout the codebase to see whether a Task is running or not:

$ ack '> api.TaskStateRunning'
cmd/swarmctl/task/print.go:48:		if !all && t.DesiredState > api.TaskStateRunning {
cmd/swarmctl/task/inspect.go:40:	if t.Status.State > api.TaskStateRunning {
manager/dispatcher/dispatcher_test.go:589:		return s > api.TaskStateRunning
manager/dispatcher/assignments.go:229:			if t.Status.State > api.TaskStateRunning {
manager/scheduler/nodeinfo.go:111:		if t.DesiredState <= api.TaskStateRunning && oldTask.DesiredState > api.TaskStateRunning {
manager/scheduler/nodeinfo.go:116:		} else if t.DesiredState > api.TaskStateRunning && oldTask.DesiredState <= api.TaskStateRunning {
manager/scheduler/scheduler.go:73:		if t.Status.State < api.TaskStatePending || t.Status.State > api.TaskStateRunning {
manager/scheduler/scheduler.go:208:	if t.Status.State < api.TaskStatePending || t.Status.State > api.TaskStateRunning {
manager/scheduler/scheduler.go:245:	if t.Status.State > api.TaskStateRunning {
manager/allocator/network.go:627:		if t.Status.State > api.TaskStateRunning {
manager/allocator/network.go:794:	if t.Status.State > api.TaskStateRunning || isDelete {
manager/orchestrator/replicated/tasks.go:54:						if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/replicated/tasks.go:95:			if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/replicated/tasks.go:121:	if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/replicated/tasks.go:142:	if t.Status.State > api.TaskStateRunning ||
manager/orchestrator/replicated/tasks.go:153:	if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/replicated/tasks.go:175:	if t.Status.State > api.TaskStateRunning ||
manager/orchestrator/replicated/update_test.go:78:				} else if task.DesiredState > api.TaskStateRunning {
manager/orchestrator/constraintenforcer/constraint_enforcer.go:104:		if t.DesiredState < api.TaskStateAssigned || t.DesiredState > api.TaskStateRunning {
manager/orchestrator/constraintenforcer/constraint_enforcer.go:150:					if t == nil || t.DesiredState > api.TaskStateRunning {
manager/orchestrator/taskinit/init.go:62:			if t.DesiredState != api.TaskStateReady || t.Status.State > api.TaskStateRunning {
manager/orchestrator/update/updater.go:496:		if original.DesiredState > api.TaskStateRunning {
manager/orchestrator/update/updater.go:504:			if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/update/updater_test.go:386:			} else if task.DesiredState > api.TaskStateRunning {
manager/orchestrator/restart/restart.go:86:		if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/restart/restart.go:124:	if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/restart/restart.go:173:	if (n != nil && n.Status.State == api.NodeStatus_DOWN) || t.Status.State > api.TaskStateRunning {
manager/orchestrator/global/global.go:181:	if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/global/global.go:196:	if t.Status.State > api.TaskStateRunning {
manager/orchestrator/global/global.go:208:	if t.DesiredState > api.TaskStateRunning {
manager/orchestrator/global/global.go:213:	if t.Status.State > api.TaskStateRunning {
manager/orchestrator/global/global.go:496:				if t == nil || t.DesiredState > api.TaskStateRunning {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants