allocator: Retry failed allocations immediately upon a deallocation by aaronlehmann · Pull Request #2235 · moby/swarmkit

aaronlehmann · 2017-06-09T23:16:03Z

We retry failed allocations every 5 minutes. If something else gets
deallocated, we should trigger the retry immediately in case the
allocations were failing due to IP exhaustion, and the deallocation
freed up an IP.

cc @abhinandanpb

We retry failed allocations every 5 minutes. If something else gets deallocated, we should trigger the retry immediately in case the allocations were failing due to IP exhaustion, and the deallocation freed up an IP. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

abhi · 2017-06-09T23:52:15Z

 		a.procTasksNetwork(ctx, false)

-		if time.Since(nc.lastRetry) > retryInterval {
+		if time.Since(nc.lastRetry) > retryInterval || nc.somethingWasDeallocated {


will this still need a cluster event ?

isnt that the main problem ? If there is no cluster event we will not allocate even if somethingWasDeallocated is set ?

somethingWasDeallocated is only set during an event. These events are always followed by EventCommit.

sg . Thanks

abhi · 2017-06-09T23:52:20Z

LGTM thanks for taking care of this.
Related moby/moby#33619

aluzzardi · 2017-06-13T22:34:41Z

Is there even a point to try to allocate if something was not de-allocated?

aluzzardi · 2017-06-13T22:36:37Z

I don't have a quick solution in mind, but is there a way to avoid spreading somethingWasDeallocated around? It's very likely we'll end up forgetting to update one one way or the other, and the bug will be extremely subtle since eventually it's going to work (5 minutes delay at most), we'll just experience rare allocation slowness on certain conditions (IP exhaustion)

aaronlehmann · 2017-06-13T22:37:31Z

Is there even a point to try to allocate if something was not de-allocated?

Being out of IPs is only one possible error. I don't have a good idea of what the others are and what situations would make the tasks allocatable again, but one thing I can think of is a missing ingress network, where creating the network again would allow forward progress.

aaronlehmann · 2017-06-13T22:39:37Z

I don't have a quick solution in mind, but is there a way to avoid spreading somethingWasDeallocated around?

The only thing I can think of is to write a deallocation function that takes an interface{} and does a type switch to figure out how to deallocate the object, but I'm not sure it's worth it.

aluzzardi · 2017-06-13T22:43:37Z

Maybe something simpler?

We're setting somethingWasDeallocated 7 times around, it's going to be easy to forget.

How about:

 			a.procUnallocatedNetworks(ctx)
  			a.procUnallocatedServices(ctx)		  		
  			a.procTasksNetwork(ctx, true)

Can we have that at some sort of tick (like we do in the scheduler/orchestrator), which we call every N minutes or every time we detect a change in nodes/services/tasks?

That way we'd just call tick in those places rather than keeping track of state changes around

aaronlehmann · 2017-06-13T22:53:28Z

That way we'd just call tick in those places rather than keeping track of state changes around

Those places should not call tick before the commit event, otherwise a batch of deallocations would trigger many ticks. This is the motivation for somethingWasDeallocated.

Also, I don't understand why calling tick is harder to forget than setting the flag.

aluzzardi · 2017-06-13T23:29:42Z

Yeah you're right. LGTM!

aaronlehmann force-pushed the better-allocator-retries branch from 2ea1926 to e913523 Compare June 9, 2017 23:21

abhi reviewed Jun 9, 2017

View reviewed changes

aluzzardi merged commit 7422847 into moby:master Jun 13, 2017

aaronlehmann deleted the better-allocator-retries branch June 14, 2017 00:55

nishanttotla mentioned this pull request May 23, 2018

[17.06] allocator: Retry failed allocations immediately upon a deallocation #2642

Merged

anshulpundir mentioned this pull request Jul 19, 2018

[Backport 17.06] allocator: Retry failed allocations immediately upon a deallocation #2711

Merged

Conversation

aaronlehmann commented Jun 9, 2017

Uh oh!

abhi Jun 9, 2017

Choose a reason for hiding this comment

Uh oh!

aaronlehmann Jun 9, 2017

Choose a reason for hiding this comment

Uh oh!

abhi Jun 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aaronlehmann Jun 10, 2017

Choose a reason for hiding this comment

Uh oh!

abhi Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

abhi commented Jun 9, 2017

Uh oh!

aluzzardi commented Jun 13, 2017

Uh oh!

aluzzardi commented Jun 13, 2017

Uh oh!

aaronlehmann commented Jun 13, 2017

Uh oh!

aaronlehmann commented Jun 13, 2017

Uh oh!

aluzzardi commented Jun 13, 2017

Uh oh!

aaronlehmann commented Jun 13, 2017

Uh oh!

aluzzardi commented Jun 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abhi Jun 10, 2017 •

edited

Loading