TLA+ model for SwarmKit tasks by talex5 · Pull Request #2613 · moby/swarmkit

talex5 · 2018-04-20T09:08:38Z

- What I did

I want to learn about how SwarmKit works, and extend its documentation. As a first step, I've been reading through the docs in the design directory and writing a more formal version of them. If someone familiar with SwarmKit has time to read over and correct my intepretation, that would be great. I marked with XXX things I was particularly unsure about. For example, the docs say very little about how a service is removed.

To make it more precise, I've written the document using TLA+ notation, but hopefully it's still readable to people not familiar with that.

So far, this is based only on reading the design docs. I plan to start reading through the Go code soon, but I wanted to get some early feedback first. Once it's correct, it could be added to the existing documentation.

- How to test it

It's easiest to read the PDF rendering of the spec.

You can also load the spec into the TLA toolbox and run the model checker on it to test the claimed properties. The document explains how to do that.

- Description for the changelog

Add a TLA+ spec for SwarmKit's task model

codecov · 2018-04-20T09:30:42Z

Codecov Report

Merging #2613 into master will decrease coverage by 0.48%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2613      +/-   ##
==========================================
- Coverage   62.31%   61.82%   -0.49%     
==========================================
  Files         134      134              
  Lines       21739    21821      +82     
==========================================
- Hits        13546    13491      -55     
- Misses       6740     6892     +152     
+ Partials     1453     1438      -15

dperny

Hopefully these explanations will make some things clearer.

dperny · 2018-04-20T17:24:33Z

+(* This has every field mentioned in `task_model.md' except for `spec', which
+   it doesn't seem to use for anything.
+
+   XXX: can desired_state be any state, or just some (e.g. {ready, running, shutdown, remove})? *)


I believe that there are no guarantees about this, but the current behavior should be only running, shutdown, and remove. I have also envisioned that one day for single-run tasks, the desired state might be completed. So to model accurately, it may be best to model the desired state as any possible state. And if that doesn't work, then we can constrain states as needed.

dperny · 2018-04-20T17:33:02Z

+  \/ CreateService
+  \/ UpdateService
+
+(* XXX: what about removing a service? (not described in any detail in the design docs) *)


Removing a service is possible. The original way it was done is to delete the service object. This cascades to the Orchestrator, which saw that the service was removed and deleted all of its tasks. The deletion of the tasks caused the Dispatcher to send an updated Assignments to the workers which did not include the deleted tasks, and in turn the worker shut down the deleted tasks.

This caused an issue because shutting down a task is far from instant, while deleting a task is. When a task is deleted, its network resources are freed by the Allocator, and can be reused. However, because the task may take some time to shut down, it was possible for network resources (IP addresses) to be reused while the old tasks' containers were still shutting down.

The solution for this was to add the Remove state, which (being a terminal state) indicates to a worker that the task should be shut down. Then, when the task is shut down, that state is reported back to the manager, which affirmatively knows that the task is terminated, and at that point can safely delete the resources it uses.

However, a further extension of this problem is the fact that a service itself can have resources associated with it, specifically Virtual IPs. Deleting the service would free those virtual IPs, creating the same problem. So, to fix that, I believe @nishanttotla was or is working on a change to the service object to flag it for removal, instead of removing it outright, which would be treated like a deletion used to be, kicking off the movement of tasks to desired state removed, etc., and then finally deleting the service when all of the tasks were deleted.

OK, I'll wait for @nishanttotla to do that (or submit a design for it) and add it to the model then.

To check I understand the current system:

The service is removed from the system.

The orchestrator sets the desired state for every task without a service to remove.

Once a task with no service is shut down, the reaper will remove it.

dperny · 2018-04-20T17:43:14Z

+   are configured to start the new task before stopping the old one.'' - need to implement this
+
+   XXX: ``The updater takes care of making sure that each slot converges to having a single running task.''
+   How does it do that?


You'd have to look at the code itself, but the pattern is probably something along the lines of watching the event stream for task updates, and moving desired state of older tasks in a slot during a rolling update to shutdown.

dperny · 2018-04-20T17:44:30Z

+
+   XXX: `orchestrators.md' says: ``[The Restart supervisor] atomically changes the state of the old task to Shutdown''.
+   However, this conflicts with the claim that only the agent can change the state of a running task.
+   Presumably it means the desired state.


Yes, probably means desired state. Every state after assigned is set by the agent. Feel free to either open a new PR or add to this PR an update to the documentation.

dperny · 2018-04-20T17:44:45Z

+            (* XXX: spec says ``Tasks that were removed because of service removal
+                    or scale down are not kept around in task history.
+                    Does that include tasks that had already finished for some
+                    other reason? *)


Going further off my explanation of slots, and the fact that the primary purpose of a slot is to maintain task history, the tasks in a slot have very little relationship between each other beyond the fact that they share a slot number.

When a task is removed in a scale down or service removal, the history for that slot is no longer useful in any meaningful way, so all tasks with the removed slot number in any state are removed. The slot no longer "exists" (as far as the concept of existence is meaningful for a slot) and therefore no longer requires any history.

dperny · 2018-04-20T18:01:45Z

+    (* Every task has a service:
+
+       XXX: The spec says: ``In some cases, there are tasks that exist independent of any service.
+            These do not have a value set in service_id.''. Need an example of one. *)


Network attachment tasks. They're an abomination, an absolutely disgusting hack that should never have been shipped. They are tasks which use the NetworkAttachmentSpec runtime spec, a spec which contains a container ID. These tasks are created through the grpc server implemented in resourceapi, and they are created by the agent. The purpose of Network Attachment tasks is to attach regular docker run containers to a swarmkit overlay network that has been created with the --attachable option.

There have been bugs in the past with attachment tasks because we have no visibility into their lifecycle and their creation and destruction is entirely at the whim of the agent. The end result is that network attachment tasks would remain in the object store after they had been terminated, leaking resources and causing the associated attached networks to be unremovable.

I added a workaround to remove network attachment tasks when a node is removed in #2409, but the underlying issue IIRC remains.

I'm sorry.

Also, @jlhawn is working on a hack week project to run one-off tasks, like docker run-esque tasks, that would exist independently of services.

dperny · 2018-04-20T18:04:02Z

+Transition == {
+  << null, new >>,
+  << new, pending >>,
+  << pending, assigned >>,


I can't quite remember, I'd have to go check the code, but it may be possible to move from new to a terminal state, if the task cannot be allocated because there aren't enough free resources. I think. I'm not sure.

dperny · 2018-04-20T18:05:01Z

+  << shutdown, remove >>,
+  << rejected, remove >>,
+  << complete, remove >>,
+  << failed, remove >>,


Tasks will never actually BE in the removed state. They will only be in the desired state of removed. The state functions mainly as a flag to the task reaper, to indicate that a task once in a terminal state it and its history can be deleted.

dperny · 2018-04-20T18:07:41Z

+  << shutdown, orphaned >>,
+
+  << remove, null >>,
+  << orphaned, null >>


Tasks can be deleted (moved to the null state, I think, in terms of the implementation of this model) from any terminal state. For example, when a task history is deleted, the old tasks will all be in terminal states (like SHUTDOWN or FAILED) and will be deleted straight from those states. Once a task is in a terminal state, the resources it had in use are definitely free and it is safe to remove it.

dperny · 2018-04-20T18:09:03Z

+=============================================================================
+\* Modification History
+\* Last modified Thu Apr 19 14:55:46 BST 2018 by tal
+\* Created Mon Apr 16 11:31:19 BST 2018 by tal


this is a dumb nit, but wouldn't the modification history just be tracked in git? Or does this history get baked into the PDF that a TLA+ file creates?

It's just some noise the TLA+ toolbox updates automatically when you save. Although it looks like if you delete it then it doesn't create it again, so easily solved!

talex5 · 2018-04-23T16:35:06Z

Thanks for the detailed review @dperny! I've made some updates:

Task.pdf

Explicitly model SwarmKit's view of a node's state (up or down). This is needed because we shouldn't automatically orphan nodes as soon as the node is down. It also makes it easier to write
properties that depend on node state.
Note that removing services will be modelled once SwarmKit's new design for this is ready.
Confirm that slots do not exist as separate objects.
Confirm that already-terminated terms are removed when their slot is no longer needed.
remove is not actually a possible state of a task, only a desired state. The reaper removes any task that has finished and has this as its desired state.
Model the allocator rejecting tasks.
Some minor corrections to the existing markdown docs.

Note: I also made a change to the tasks variable, which should make things a bit easier to read as the model now mainly talks about things in terms of tasks rather than task IDs.

I also updated the set of valid transitions to group them by the component that performs them, and to check this.

anshulpundir · 2018-04-23T18:01:55Z

-progress through the `ACCEPTED`, `PREPARING`, and `STARTING` states on the
-way to `RUNNING`. If a task exits without an error code, it moves to the
+progress through the `ACCEPTED`, `PREPARING`, `READY', and `STARTING` states on
+the way to `RUNNING`. If a task exits without an error code, it moves to the


If a task exits without an error code, it moves to the COMPLETE state.

Lets also clarify this by saying that the actual state is changed to COMPLETE by the agent.

Actually, it says the agent handles these states on the line above the bit shown by GitHub's diff, so maybe it's OK as it is.

OK, its fine as is.

dperny · 2018-04-24T00:00:55Z

OK, I'll wait for @nishanttotla to do that (or submit a design for it) and add it to the model then.

To check I understand the current system:

The service is removed from the system.

The orchestrator sets the desired state for every task without a service to remove.

Once a task with no service is shut down, the reaper will remove it.

More accurately, the service is removed from the system, the process of which involves setting the desired state of all of its tasks REMOVE, and when any task in desired state REMOVE enters a terminal state, the taskreaper will delete them.

anshulpundir · 2018-04-23T22:25:23Z

+(* A description of the task model in SwarmKit.
+   Currently, this is mostly based on `design/task_model.md'.
+
+   Note: I am not yet familiar with the SwarmKit code, and this document is likely to


nit: looks like this is no longer true :)

anshulpundir · 2018-04-24T00:12:24Z

More accurately, the service is removed from the system, the process of which involves setting the desired state of all of its tasks REMOVE, and when any task in desired state REMOVE enters a terminal state, the taskreaper will delete them.

Even more accurately, any task which is in a terminal state or in an unassigned state (i.e. < ASSIGNED), the taskreaper will delete them.

Also:

The orchestrator sets the desired state for every task without a service to remove.

Not every task without a service, but just the tasks for the service just deleted.

@dperny @talex5

talex5 · 2018-04-24T13:35:32Z

Thanks for the explanation. I've added service removal. I modelled it as flagging the service and all its tasks for removal and then removing it when it had no tasks left (i.e. the proposed design rather than the current one). I added a comment about it in the model.

I also updated the reaper to allow removing tasks that haven't yet been assigned (as explained by @anshulpundir), and added a couple of exceptions to the rule that only one component is in charge of each state to cover that.

Task.pdf

GordonTheTurtle · 2018-04-30T15:15:18Z

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "tla" git@github.com:talex5/swarmkit.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842353920240
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

talex5 · 2018-05-02T13:43:31Z

I've now split the spec into multiple files, and created a separate low-level spec for worker nodes. I also added a Makefile so you can build the PDFs and run the model checker without having to install the TLA toolbox (it uses a Docker image). This could also be used in CI.

talex5 · 2018-05-11T10:01:07Z

I think this PR is ready for merging now.

anshulpundir

Some minor comments but otherwise looks good!

I also think we should squash the commits together.

anshulpundir · 2018-05-15T16:52:06Z

-progress through the `ACCEPTED`, `PREPARING`, and `STARTING` states on the
-way to `RUNNING`. If a task exits without an error code, it moves to the
+progress through the `ACCEPTED`, `PREPARING`, `READY', and `STARTING` states on
+the way to `RUNNING`. If a task exits without an error code, it moves to the


OK, its fine as is.

anshulpundir · 2018-05-16T07:47:16Z

+(* The user removes a service.
+
+   Note: Currently, SwarmKit deletes the service from its records immediately.
+   However, this isn't right because we need to wait for the resources to be freed.


this isn't right because

Instead of saying this isn't right, I'd rather say that this is not very elegant. Because we still wait for resources to be freed before deleting the tasks, and resources are tied to tasks and not services.

@dperny mentioned that there are also service-level resources that need to be freed. How about:

However, this isn't right because we need to wait for service-level resources such as Virtual IPs to be freed.

anshulpundir · 2018-05-16T07:49:01Z

+        \E id   \in UnusedId(sid, slot) :
+           tasks' = tasks \union { NewTask(sid, slot, id, running) }
+
+(* Add a task if a slot exists, contains no runnable tasks, and we weren't trying to remove it.


supernit: remove comma before 'and' ?

I think this is correct. According to https://en.wikipedia.org/wiki/Serial_comma:

A majority of American style guides mandate use of the serial comma

anshulpundir · 2018-05-16T07:51:44Z

+(* We don't model the updater explicitly, but we allow any task to be restarted (perhaps with
+   a different image) at any time, which should cover the behaviours of the restart supervisor.
+
+   XXX: wait for new task to be ready before shutting down old one?


Is this a TODO ? I think some clarification is needed.

I agree. I've updated it to:

TODO: SwarmKit also allows ``start-first'' mode updates where we first get the new task to
`running' and then mark the old task for shutdown. Add this to the model.

anshulpundir · 2018-05-16T07:54:26Z

+
+   TODO: ``However, there are also cases where a slot may have multiple tasks with the
+   desired state of RUNNING. This can happen during rolling updates when the updates
+   are configured to start the new task before stopping the old one.'' - need to implement this *)


need to implement this

By this you mean implement the TLA model for this ?

anshulpundir · 2018-05-16T07:55:26Z

+
+(* The allocator rejects the task because there aren't enough resources.
+
+   XXX: Not clear whether SwarmKit actually does this.


Don' think it does. The task just stays in NEW (or READY ?) until resources are available. cc @dperny

He said he wasn't sure: #2613 (comment)
Shall I remove this then?

Browsing through the code, I don't see a case where the current allocator fails tasks if there aren't enough resources. @talex5

OK, thanks for checking! I've removed this bit now.

anshulpundir · 2018-05-16T07:55:49Z

+
+---------------------------- MODULE Scheduler -------------------------------
+
+\*  Actions performed the scheduler


performed the scheduler => performed by the scheduler

anshulpundir · 2018-05-16T07:58:46Z

+(* Forget about tasks in remove or orphan states.
+
+   Orphaned tasks belong to nodes that we are assuming are lost forever.
+   Or, at the very least, it's likely that a node outage lasting longer than the timeout


has crashed => has happened ?

How about:

Orphaned tasks belong to nodes that we are assuming are lost forever (or have crashed and will come up with nothing running, which is an equally fine outcome).

I think this is what @dperny meant in #2613 (comment).

anshulpundir · 2018-05-16T08:03:14Z

+
+   ``Moving a task to the Orphaned state is not desirable,
+   because it's the one case where we break the otherwise invariant
+   that the agent sets all states past ASSIGNED.''


that the agent sets all states past ASSIGNED.

Clarify that this is the actual state and not the desired state ?

These documents use the TLA+ notation to describe the behaviour of SwarmKit more precisely than the prose documentation in the design directory. Run `make pdfs` to render these documents as PDF files. The best one to start with is `SwarmKit.pdf`, which introduces the TLA+ notation and describes the overall components of SwarmKit. The specifications can also be executed by the TLC model checker to help find mistakes. Use `make check` to run the checks. Signed-off-by: Thomas Leonard <thomas.leonard@docker.com>

TLA+ model for SwarmKit tasks

dperny reviewed Apr 20, 2018

View reviewed changes

anshulpundir reviewed Apr 23, 2018

View reviewed changes

anshulpundir reviewed Apr 24, 2018

View reviewed changes

GordonTheTurtle added the dco/no label Apr 30, 2018

talex5 force-pushed the tla branch from 3cba954 to 96d9575 Compare April 30, 2018 15:30

GordonTheTurtle removed the dco/no label Apr 30, 2018

talex5 changed the title ~~[RFC] TLA+ model for SwarmKit tasks~~ TLA+ model for SwarmKit tasks May 2, 2018

anshulpundir reviewed May 16, 2018

View reviewed changes

talex5 force-pushed the tla branch from 9431093 to c43c585 Compare May 16, 2018 11:23

anshulpundir approved these changes May 21, 2018

View reviewed changes

talex5 force-pushed the tla branch from c43c585 to de903c5 Compare June 4, 2018 13:27

talex5 force-pushed the tla branch from de903c5 to 33108cb Compare June 4, 2018 13:41

anshulpundir merged commit 2546862 into moby:master Jun 7, 2018

talex5 deleted the tla branch June 8, 2018 15:33

anshulpundir added a commit to anshulpundir/swarmkit that referenced this pull request Jun 11, 2018

Merge pull request moby#2613 from talex5/tla

c01f632

TLA+ model for SwarmKit tasks


		(* The allocator rejects the task because there aren't enough resources.

		XXX: Not clear whether SwarmKit actually does this.


		---------------------------- MODULE Scheduler -------------------------------

		\* Actions performed the scheduler

Conversation

talex5 commented Apr 20, 2018

Uh oh!

codecov Bot commented Apr 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dperny left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

talex5 commented Apr 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dperny commented Apr 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anshulpundir commented Apr 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

talex5 commented Apr 24, 2018

Uh oh!

GordonTheTurtle commented Apr 30, 2018

Uh oh!

talex5 commented May 2, 2018

Uh oh!

talex5 commented May 11, 2018

Uh oh!

anshulpundir left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

codecov Bot commented Apr 20, 2018 •

edited

Loading

anshulpundir commented Apr 24, 2018 •

edited

Loading

anshulpundir left a comment •

edited

Loading

anshulpundir May 21, 2018 •

edited

Loading