allocator: Retry failed allocations immediately upon a deallocation#2235
Conversation
We retry failed allocations every 5 minutes. If something else gets deallocated, we should trigger the retry immediately in case the allocations were failing due to IP exhaustion, and the deallocation freed up an IP. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
2ea1926 to
e913523
Compare
| a.procTasksNetwork(ctx, false) | ||
|
|
||
| if time.Since(nc.lastRetry) > retryInterval { | ||
| if time.Since(nc.lastRetry) > retryInterval || nc.somethingWasDeallocated { |
There was a problem hiding this comment.
will this still need a cluster event ?
There was a problem hiding this comment.
isnt that the main problem ? If there is no cluster event we will not allocate even if somethingWasDeallocated is set ?
There was a problem hiding this comment.
somethingWasDeallocated is only set during an event. These events are always followed by EventCommit.
|
LGTM thanks for taking care of this. |
|
Is there even a point to try to allocate if something was not de-allocated? |
|
I don't have a quick solution in mind, but is there a way to avoid spreading |
Being out of IPs is only one possible error. I don't have a good idea of what the others are and what situations would make the tasks allocatable again, but one thing I can think of is a missing ingress network, where creating the network again would allow forward progress. |
The only thing I can think of is to write a deallocation function that takes an |
|
Maybe something simpler? We're setting How about: Can we have that at some sort of That way we'd just call |
Those places should not call Also, I don't understand why calling |
|
Yeah you're right. LGTM! |
We retry failed allocations every 5 minutes. If something else gets
deallocated, we should trigger the retry immediately in case the
allocations were failing due to IP exhaustion, and the deallocation
freed up an IP.
cc @abhinandanpb