Skip to content

Actually starting to use the deallocator to clean up services#2778

Merged
anshulpundir merged 1 commit intomoby:masterfrom
wk8:wk8/network_rm_follow_up
Nov 26, 2018
Merged

Actually starting to use the deallocator to clean up services#2778
anshulpundir merged 1 commit intomoby:masterfrom
wk8:wk8/network_rm_follow_up

Conversation

@wk8
Copy link
Copy Markdown
Contributor

@wk8 wk8 commented Nov 5, 2018

- What I did

This patch actually stars to use the deallocator to clean up services, as
well as service-level resources (as of now, only networks).

List of changes:

  • networks and services don't get deleted right away any more, that's
    now the deallocator's job
  • networks and services that are pending deletion are marked as such
    by swarmctl
  • updated all of the relevant code & tests around network & service
    deletion
  • added unit tests for new functionalities

- How I did it

A previous patch (#2759) introduced a
new component, the deallocator, responsible for cleaning up services and
service-level resources. This patch is actually starting to make use of that
component.

Since the deallocator used to rely on the reaper deleting the tasks belonging
to services that had been marked for removal, a previous version of this patch
was modifying the task reaper quite heavily to also keep track of such services
(needed since otherwise the reaper would fail to clean up all of them, instead
keeping some for history tracking purposes). However, it soon appeared that
this was not the best approach:

  • this creates a hidden coupling between the reaper and the deallocator
  • it's also not the best user experience to suddenly remove all of a service's
    task history while shutting down, for not apparent reason to the user

For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service.

- How to test it

Updated tests.

- Description for the changelog

Services & networks are no longer deleted immediately when a user requests their
deletion; instead, they are deleted when all their tasks are actually shut down.

@GordonTheTurtle
Copy link
Copy Markdown

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "wk8/network_rm_follow_up" git@github.com:wk8/swarmkit.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354378880
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@wk8 wk8 force-pushed the wk8/network_rm_follow_up branch from f511af4 to 4d41f49 Compare November 5, 2018 20:29
@wk8 wk8 force-pushed the wk8/network_rm_follow_up branch 5 times, most recently from a3d477b to 8ee61c6 Compare November 5, 2018 21:34
@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 5, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@14cfce8). Click here to learn what that means.
The diff coverage is 81.57%.

@@            Coverage Diff            @@
##             master    #2778   +/-   ##
=========================================
  Coverage          ?   61.97%           
=========================================
  Files             ?      137           
  Lines             ?    22094           
  Branches          ?        0           
=========================================
  Hits              ?    13692           
  Misses            ?     6929           
  Partials          ?     1473

@dperny
Copy link
Copy Markdown
Collaborator

dperny commented Nov 5, 2018

LGTM, especially with the tests.

Copy link
Copy Markdown
Contributor

@anshulpundir anshulpundir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall, minor comments/questions

return status.Errorf(codes.NotFound, "service %s not found", request.ServiceID)
}

if service.PendingDelete {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be moved up higher in the function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can. Sure, we could check whether it's pending deletion as soon as we retrieve the service object above, but we would still also need to re-check here, once we hold the write lock; otherwise it could have been marked for deletion in the meantime. Maybe it's worth adding a comment to that effect?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(added a comment to this effect)

fmt.Fprintln(w, "Template\t")
fmt.Fprintln(w, " Container\t")
ctr := service.Spec.Task.GetContainer()
task := spec.Task
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed a bit silly to define spec := service.Spec above without consistently using it in the rest of the function?

func (deallocator *Deallocator) handleEvent(ctx context.Context, event events.Event) (bool, error) {
switch typedEvent := event.(type) {
case api.EventUpdateTask:
return deallocator.processTaskEvent(ctx, typedEvent.Task, typedEvent.OldTask)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment on how each of these invocations of the same functions are different and the different value passed for different arguments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the function's comment itself, should hopefully be clearer?

Comment thread manager/deallocator/deallocator.go Outdated

if present && isTaskStillAlive(oldTask) && (newTask == nil || !isTaskStillAlive(newTask)) {
// this task belongs to a service that's shutting down, and was amongst
// the ones that we counted as alive when adding this service to our
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding this service to our internal state

What does this mean? Please clarify the comment.

Comment thread manager/deallocator/deallocator.go Outdated
serviceWithCount, present := deallocator.services[serviceID]

if present && isTaskStillAlive(oldTask) && (newTask == nil || !isTaskStillAlive(newTask)) {
// this task belongs to a service that's shutting down, and was amongst
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if condition has two tasks: oldTask and newTask. Which one is the comment for? What about the other task?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They both belong to the same service, since a task can never be re-allocated :)

serviceID := oldTask.ServiceID
serviceWithCount, present := deallocator.services[serviceID]

if present && isTaskStillAlive(oldTask) && (newTask == nil || !isTaskStillAlive(newTask)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a clearer comment on what this if statement achieves

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to make the existing comment clearer

}
}

// Common logic for handling task update/delete events
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please enhance the comment to include a description of the function params.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if orchestrator.IsGlobalService(s) {
g.updateService(s)
reconcileServiceIDs = append(reconcileServiceIDs, s.ID)
if s.PendingDelete {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// ask the tasks to shut down
orchestrator.SetServiceTasksRemove(ctx, g.store, v.Service)
// delete the service from service map
delete(g.globalServices, v.Service.ID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once removed from here, is the service still accessible from UI/CLI using list commands etc? Is that the behavior we're going for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is. There will be a CLI change to explicitly mark it as pending for deletion in the CLI's output.

**- What I did**

This patch actually stars to use the deallocator to clean up services, as
well as service-level resources (as of now, only networks).

**- How I did it**

A previous patch (moby#2759) introduced a
new component, the deallocator, responsible for cleaning up services and
service-level resources. This patch is actually starting to make use of that
component.

Since the deallocator used to rely on the reaper deleting the tasks belonging
to services that had been marked for removal, a previous version of this patch
was modifying the task reaper quite heavily to also keep track of such services
(needed since otherwise the reaper would fail to clean up all of them, instead
keeping some for history tracking purposes). However, it soon appeared that
this was not the best approach:
 * this creates a hidden coupling between the reaper and the deallocator
 * it's also not the best user experience to suddenly remove all of a service's
   task history while shutting down, for not apparent reason to the user

For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service.

**- How to test it**

Updated tests.

**- Description for the changelog**

Services & networks are no longer deleted immediately when a user requests their
deletion; instead, they are deleted when all their tasks are actually shut down.

Signed-off-by: Jean Rouge <jer329@cornell.edu>
@wk8 wk8 force-pushed the wk8/network_rm_follow_up branch from decedca to 2ba1cd9 Compare November 15, 2018 23:51
@anshulpundir anshulpundir merged commit 72e2559 into moby:master Nov 26, 2018
wk8 added a commit to wk8/moby that referenced this pull request Jan 10, 2019
Mainly for the sake of testing the new "soft delete" capabilities of
Swarmkit, see more at
moby/swarmkit#2778

Signed-off-by: Jean Rouge <jer329@cornell.edu>
@cpuguy83
Copy link
Copy Markdown
Member

This is a breaking change. Before delete was synchronous and now it is async. I agree with and like the change, but ultimately seems problematic.

For docker to not break we will need to poll the service to check for deletion before the API returns.

wk8 added a commit to wk8/swarmkit that referenced this pull request Feb 1, 2019
**- What I did**

This patch allows users to choose to make removals of services and networks
asynchronous, which means that they only get deleted once all the containers
belonging to them or using them are safely shut down.

**- How I did it**

A previous patch (moby#2759) introduced a
new component, the deallocator, responsible for cleaning up services and
service-level resources. This patch is actually starting to make use of that
component.

Since the deallocator used to rely on the reaper deleting the tasks belonging
to services that had been marked for removal, a previous version of this patch
was modifying the task reaper quite heavily to also keep track of such services
(needed since otherwise the reaper would fail to clean up all of them, instead
keeping some for history tracking purposes). However, it soon appeared that
this was not the best approach:
 * this creates a hidden coupling between the reaper and the deallocator
 * it's also not the best user experience to suddenly remove all of a service's
   task history while shutting down, for not apparent reason to the user

For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service.

Part of this test was already reviewed in
moby#2778, which subsequently got reverted
following the discussion in moby/moby#38525.
The main difference between this PR and moby#2778 is that asynchronous removal is
an opt-in behaviour here instead of changing the existing behaviour.

**- How to test it**

Updated tests.

**- Description for the changelog**

Services & networks can be deleted asynchronously, in which case the request
for their removal simply marks them for removal, and they are then actually
deleted only when all their tasks are actually shut down.

Signed-off-by: Jean Rouge <rougej+github@gmail.com>
wk8 added a commit to wk8/swarmkit that referenced this pull request Feb 1, 2019
**- What I did**

This patch allows users to choose to make removals of services and networks
asynchronous, which means that they only get deleted once all the containers
belonging to them or using them are safely shut down.

**- How I did it**

A previous patch (moby#2759) introduced a
new component, the deallocator, responsible for cleaning up services and
service-level resources. This patch is actually starting to make use of that
component.

Since the deallocator used to rely on the reaper deleting the tasks belonging
to services that had been marked for removal, a previous version of this patch
was modifying the task reaper quite heavily to also keep track of such services
(needed since otherwise the reaper would fail to clean up all of them, instead
keeping some for history tracking purposes). However, it soon appeared that
this was not the best approach:
 * this creates a hidden coupling between the reaper and the deallocator
 * it's also not the best user experience to suddenly remove all of a service's
   task history while shutting down, for not apparent reason to the user

For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service.

Part of this test was already reviewed in
moby#2778, which subsequently got reverted
following the discussion in moby/moby#38525.
The main difference between this PR and moby#2778 is that asynchronous removal is
an opt-in behaviour here instead of changing the existing behaviour.

**- How to test it**

Updated tests.

**- Description for the changelog**

Services & networks can be deleted asynchronously, in which case the request
for their removal simply marks them for removal, and they are then actually
deleted only when all their tasks are actually shut down.

Signed-off-by: Jean Rouge <rougej+github@gmail.com>
wk8 added a commit to wk8/swarmkit that referenced this pull request Feb 1, 2019
**- What I did**

This patch allows users to choose to make removals of services and networks
asynchronous, which means that they only get deleted once all the containers
belonging to them or using them are safely shut down.

**- How I did it**

A previous patch (moby#2759) introduced a
new component, the deallocator, responsible for cleaning up services and
service-level resources. This patch is actually starting to make use of that
component.

Since the deallocator used to rely on the reaper deleting the tasks belonging
to services that had been marked for removal, a previous version of this patch
was modifying the task reaper quite heavily to also keep track of such services
(needed since otherwise the reaper would fail to clean up all of them, instead
keeping some for history tracking purposes). However, it soon appeared that
this was not the best approach:
 * this creates a hidden coupling between the reaper and the deallocator
 * it's also not the best user experience to suddenly remove all of a service's
   task history while shutting down, for not apparent reason to the user

For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service.

Part of this test was already reviewed in
moby#2778, which subsequently got reverted
following the discussion in moby/moby#38525.
The main difference between this PR and moby#2778 is that asynchronous removal is
an opt-in behaviour here instead of changing the existing behaviour.

**- How to test it**

Updated tests.

**- Description for the changelog**

Services & networks can be deleted asynchronously, in which case the request
for their removal simply marks them for removal, and they are then actually
deleted only when all their tasks are actually shut down.

Signed-off-by: Jean Rouge <rougej+github@gmail.com>
wk8 added a commit to wk8/swarmkit that referenced this pull request Feb 1, 2019
**- What I did**

This patch allows users to choose to make removals of services and networks
asynchronous, which means that they only get deleted once all the containers
belonging to them or using them are safely shut down.

**- How I did it**

A previous patch (moby#2759) introduced a
new component, the deallocator, responsible for cleaning up services and
service-level resources. This patch is actually starting to make use of that
component.

Since the deallocator used to rely on the reaper deleting the tasks belonging
to services that had been marked for removal, a previous version of this patch
was modifying the task reaper quite heavily to also keep track of such services
(needed since otherwise the reaper would fail to clean up all of them, instead
keeping some for history tracking purposes). However, it soon appeared that
this was not the best approach:
 * this creates a hidden coupling between the reaper and the deallocator
 * it's also not the best user experience to suddenly remove all of a service's
   task history while shutting down, for not apparent reason to the user

For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service.

Part of this test was already reviewed in
moby#2778, which subsequently got reverted
following the discussion in moby/moby#38525.
The main difference between this PR and moby#2778 is that asynchronous removal is
an opt-in behaviour here instead of changing the existing behaviour.

**- How to test it**

Updated tests.

**- Description for the changelog**

Services & networks can be deleted asynchronously, in which case the request
for their removal simply marks them for removal, and they are then actually
deleted only when all their tasks are actually shut down.

Signed-off-by: Jean Rouge <rougej+github@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants