Skip to content

allocator: Retry failed allocations immediately upon a deallocation#2235

Merged
aluzzardi merged 1 commit into
moby:masterfrom
aaronlehmann:better-allocator-retries
Jun 13, 2017
Merged

allocator: Retry failed allocations immediately upon a deallocation#2235
aluzzardi merged 1 commit into
moby:masterfrom
aaronlehmann:better-allocator-retries

Conversation

@aaronlehmann
Copy link
Copy Markdown
Collaborator

We retry failed allocations every 5 minutes. If something else gets
deallocated, we should trigger the retry immediately in case the
allocations were failing due to IP exhaustion, and the deallocation
freed up an IP.

cc @abhinandanpb

We retry failed allocations every 5 minutes. If something else gets
deallocated, we should trigger the retry immediately in case the
allocations were failing due to IP exhaustion, and the deallocation
freed up an IP.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann aaronlehmann force-pushed the better-allocator-retries branch from 2ea1926 to e913523 Compare June 9, 2017 23:21
a.procTasksNetwork(ctx, false)

if time.Since(nc.lastRetry) > retryInterval {
if time.Since(nc.lastRetry) > retryInterval || nc.somethingWasDeallocated {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this still need a cluster event ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Copy Markdown
Contributor

@abhi abhi Jun 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt that the main problem ? If there is no cluster event we will not allocate even if somethingWasDeallocated is set ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somethingWasDeallocated is only set during an event. These events are always followed by EventCommit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg . Thanks

@abhi
Copy link
Copy Markdown
Contributor

abhi commented Jun 9, 2017

LGTM thanks for taking care of this.
Related moby/moby#33619

@aluzzardi
Copy link
Copy Markdown
Member

Is there even a point to try to allocate if something was not de-allocated?

@aluzzardi
Copy link
Copy Markdown
Member

I don't have a quick solution in mind, but is there a way to avoid spreading somethingWasDeallocated around? It's very likely we'll end up forgetting to update one one way or the other, and the bug will be extremely subtle since eventually it's going to work (5 minutes delay at most), we'll just experience rare allocation slowness on certain conditions (IP exhaustion)

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

Is there even a point to try to allocate if something was not de-allocated?

Being out of IPs is only one possible error. I don't have a good idea of what the others are and what situations would make the tasks allocatable again, but one thing I can think of is a missing ingress network, where creating the network again would allow forward progress.

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

I don't have a quick solution in mind, but is there a way to avoid spreading somethingWasDeallocated around?

The only thing I can think of is to write a deallocation function that takes an interface{} and does a type switch to figure out how to deallocate the object, but I'm not sure it's worth it.

@aluzzardi
Copy link
Copy Markdown
Member

Maybe something simpler?

We're setting somethingWasDeallocated 7 times around, it's going to be easy to forget.

How about:

 			a.procUnallocatedNetworks(ctx)
  			a.procUnallocatedServices(ctx)		  		
  			a.procTasksNetwork(ctx, true)

Can we have that at some sort of tick (like we do in the scheduler/orchestrator), which we call every N minutes or every time we detect a change in nodes/services/tasks?

That way we'd just call tick in those places rather than keeping track of state changes around

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

That way we'd just call tick in those places rather than keeping track of state changes around

Those places should not call tick before the commit event, otherwise a batch of deallocations would trigger many ticks. This is the motivation for somethingWasDeallocated.

Also, I don't understand why calling tick is harder to forget than setting the flag.

@aluzzardi
Copy link
Copy Markdown
Member

Yeah you're right. LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants