Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions design/orchestrators.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,10 @@ means rolling updates will go node-by-node instead of slot-by-slot.
The restart supervisor manages the process of shutting down a task, and
possibly starting a replacement task. Its entry point is a `Restart` method
which is called inside a store write transaction in one of the orchestrators.
It atomically changes the state of the old task to `Shutdown`, and, if it's
appropriate to start a replacement task based on the service's restart policy,
creates a new task in the same slot (replicated service) or on the same node
(global service).
It atomically changes the desired state of the old task to `Shutdown`, and, if
it's appropriate to start a replacement task based on the service's restart
policy, creates a new task in the same slot (replicated service) or on the same
node (global service).

If the service is set up with a restart delay, the restart supervisor handles
this delay too. It initially creates the new task with the desired state
Expand Down
4 changes: 2 additions & 2 deletions design/task_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,8 @@ that the requested node has the necessary resources, in the case of global
services' tasks). It changes their state to `ASSIGNED`.

From this point, control over the state passes to the agent. A task will
progress through the `ACCEPTED`, `PREPARING`, and `STARTING` states on the
way to `RUNNING`. If a task exits without an error code, it moves to the
progress through the `ACCEPTED`, `PREPARING`, `READY', and `STARTING` states on
the way to `RUNNING`. If a task exits without an error code, it moves to the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a task exits without an error code, it moves to the COMPLETE state.

Lets also clarify this by saying that the actual state is changed to COMPLETE by the agent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it says the agent handles these states on the line above the bit shown by GitHub's diff, so maybe it's OK as it is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, its fine as is.

`COMPLETE` state. If it fails, it moves to the `FAILED` state instead.

A task may alternatively end up in the `SHUTDOWN` state if its shutdown was
Expand Down
5 changes: 5 additions & 0 deletions design/tla/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
*.toolbox
*.pdf
*.tlaps
states
metadir
25 changes: 25 additions & 0 deletions design/tla/EventCounter.tla
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---------------------------- MODULE EventCounter ----------------------------

EXTENDS Integers

\* The number of ``events'' that have occurred (always 0 if we're not keeping track).
VARIABLE nEvents

\* The maximum number of events to allow, or ``-1'' for unlimited.
maxEvents == -1

InitEvents ==
nEvents = 0 \* Start with the counter at zero

(* If we're counting events, increment the event counter.
We don't increment the counter when we don't have a maximum because that
would make the model infinite.
Actions tagged with CountEvent cannot happen once nEvents = maxEvents. *)
CountEvent ==
IF maxEvents = -1 THEN
UNCHANGED nEvents
ELSE
/\ nEvents < maxEvents
/\ nEvents' = nEvents + 1

=============================================================================
24 changes: 24 additions & 0 deletions design/tla/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
WORKERS := 4

TLA := docker run --rm -it --workdir /mnt -v ${PWD}:/mnt talex5/tla

.PHONY: all check pdfs tlaps

all: check pdfs tlaps

# Run the TLC model checker
check:
${TLA} tlc -workers ${WORKERS} SwarmKit.tla -config models/SwarmKit.cfg
${TLA} tlc -workers ${WORKERS} WorkerImpl.tla -config models/WorkerImpl.cfg

# Run the TLAPS proof checker
tlaps:
${TLA} tlapm -I /usr/local/lib/tlaps SwarmKit.tla
${TLA} tlapm -I /usr/local/lib/tlaps WorkerImpl.tla

# Generate a PDF file from a .tla file
%.pdf: %.tla
[ -d metadir ] || mkdir metadir
${TLA} java tla2tex.TLA -shade -latexCommand pdflatex -latexOutputExt pdf -metadir metadir $<

pdfs: SwarmKit.pdf Types.pdf Tasks.pdf WorkerSpec.pdf EventCounter.pdf
14 changes: 14 additions & 0 deletions design/tla/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
This directory contains documentation for SwarmKit using [TLA+][] notation.

Run `make pdfs` to render these documents as PDF files.
The best one to start with is `SwarmKit.pdf`, which introduces the TLA+ notation
and describes the overall components of SwarmKit.

The specifications can also be executed by the TLC model checker to help find
mistakes. Use `make check` to run the checks.

If you want to edit these specifications, you will probably want to use the [TLA+ Toolbox][],
which provides a GUI.

[TLA+]: https://en.wikipedia.org/wiki/TLA%2B
[TLA+ Toolbox]: http://lamport.azurewebsites.net/tla/toolbox.html
Loading