restarts: Default delay and desired=accepted.#715
Conversation
6179464 to
c514fda
Compare
Current coverage is 56.88%
@@ master #715 diff @@
==========================================
Files 77 46 -31
Lines 11057 5700 -5357
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 6066 3242 -2824
+ Misses 4163 2020 -2143
+ Partials 828 438 -390
|
|
There is already a default restart delay, set in the spec parsing code. By doing this in the restart supervisor directly, doesn't it become impossible to set a restart delay of 0? That's a problem. Not sure about setting the desired state to accepted. It does seem to work around that issue, but I wonder if we can find a better way.
|
|
@aaronlehmann I don't think we should rely on |
|
We need some way to support a restart delay of 0. Filed #721 to track this.
|
|
Setting |
|
@aaronlehmann Is there a use for a restart delay smaller than 1ns? LGTM |
In practice, having a minimum delay like this should be fine. But I think it would be very counterintuitive to specify 'RestartDelay: 0' and have a few seconds of delay. If we make the creator of the specs substitute 1 ns for an explicit 0 value, I think that would be an acceptable workaround. |
|
@aaronlehmann Or we make |
Yeah, I prefer that to special-casing 0 literals on the client side. This would mean that |
|
@aaronlehmann I think it just means delay becomes a nullable |
|
The delay is an int; is it possible to make those nullable? I know proto2 has the |
|
Right - that doesn't work |
c514fda to
5cba65c
Compare
|
Rebased |
|
5cba65c to
4567e9c
Compare
- Add a default delay of 5 seconds between restarts if not specified Otherwise we end up into a restart loop by default. - Set the new task's desired state to "ACCEPTED" rather than "READY". Ready implies pulling, which means creating a service with an invalid image leads to a restart loop. Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
4567e9c to
326309d
Compare
|
Bad rebase. Fixed |
|
I would prefer to handle 0-delay cases as part of this, but if you are in a hurry to merge, we can merge this now and fix 0-delay in a followup. There is already a ticket filed: #721. |
|
@aaronlehmann I'm going to go ahead and merge this. We can fix in a follow up by right now the behavior on master is pretty bad. |
We used to put restarted tasks in READY state. This makes sense because then they can go ahead and pull an image while we wait for the restart delay to elapse. However, moby#715 changed the restart supervisor to put restarted tasks into ACCEPTED to work around a tight restart loop when an image doesn't exist. The problem was that the task would fail immediately, leading the orchestrator to request a new restart, which would cancel the ongoing restart delay. As a better fix for this, put tasks in READY, but when a restart is requested and there is already one in progress for the old task, we wait for that restart to complete before starting the new one. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
We used to put restarted tasks in READY state. This makes sense because then they can go ahead and pull an image while we wait for the restart delay to elapse. However, moby#715 changed the restart supervisor to put restarted tasks into ACCEPTED to work around a tight restart loop when an image doesn't exist. The problem was that the task would fail immediately, leading the orchestrator to request a new restart, which would cancel the ongoing restart delay. As a better fix for this, put tasks in READY, but when a restart is requested and there is already one in progress for the old task, we wait for that restart to complete before starting the new one. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Otherwise we end up into a restart loop by default.
Ready implies pulling, which means creating a service with an invalid
image leads to a restart loop.
The second point is quite inconvenient: Moving the desired state to
READYwas a convenience so that while we were waiting for a restart delay to occur, the new node would at least start pulling the image.However, with an invalid image,
READYcan't happen since the agent transitions fromACCEPTEDtoREJECTED./cc @stevvooe @aaronlehmann @dongluochen