Skip to content

[17.06] allocator: Retry failed allocations immediately upon a deallocation#2642

Merged
anshulpundir merged 1 commit into
moby:bump_v17.06from
nishanttotla:17.06-retry-failed-allocations
May 25, 2018
Merged

[17.06] allocator: Retry failed allocations immediately upon a deallocation#2642
anshulpundir merged 1 commit into
moby:bump_v17.06from
nishanttotla:17.06-retry-failed-allocations

Conversation

@nishanttotla
Copy link
Copy Markdown
Contributor

Cherry pick #2235.

git cherry-pick -s -x e91352325579da2af724033599bc391eb7269956

Cherry-pick was not clean.

@GordonTheTurtle
Copy link
Copy Markdown

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "17.06-retry-failed-allocations" git@github.com:nishanttotla/swarmkit.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842357627600
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@nishanttotla nishanttotla changed the base branch from master to bump_v17.06 May 23, 2018 00:55
@nishanttotla nishanttotla force-pushed the 17.06-retry-failed-allocations branch from c5cd758 to 1ea785b Compare May 23, 2018 00:58
@nishanttotla nishanttotla force-pushed the 17.06-retry-failed-allocations branch from 1ea785b to 97dcb69 Compare May 23, 2018 00:58
@codecov
Copy link
Copy Markdown

codecov Bot commented May 23, 2018

Codecov Report

❗ No coverage uploaded for pull request base (bump_v17.06@c1808c3). Click here to learn what that means.
The diff coverage is 53.84%.

@@              Coverage Diff               @@
##             bump_v17.06    #2642   +/-   ##
==============================================
  Coverage               ?   61.22%           
==============================================
  Files                  ?      120           
  Lines                  ?    20055           
  Branches               ?        0           
==============================================
  Hits                   ?    12279           
  Misses                 ?     6434           
  Partials               ?     1342

@cyli
Copy link
Copy Markdown
Contributor

cyli commented May 23, 2018

I think one more change is missing from the original PR: https://github.com/docker/swarmkit/pull/2235/files#diff-119d353212583d96a59cba8c82b80280R656
I think this is around like 750 in this branch?

@cyli
Copy link
Copy Markdown
Contributor

cyli commented May 23, 2018

I also don't know enough about the old allocator code - there are a couple more places where committing we attempt to roll back an allocation because a commit failed (at https://github.com/nishanttotla/swarmkit/blob/97dcb699cf732d0d2816a57a2d777b0f5f59fd14/manager/allocator/network.go#L816 for example) - do we need to add somethingDeallocated = true to those places? They're not in the original PR, but I wasn't sure if they should be.

@nishanttotla nishanttotla force-pushed the 17.06-retry-failed-allocations branch from 97dcb69 to f10289d Compare May 23, 2018 18:11
@nishanttotla
Copy link
Copy Markdown
Contributor Author

Rebased on #2643 to fix tests.

@nishanttotla
Copy link
Copy Markdown
Contributor Author

nishanttotla commented May 23, 2018

@dperny do you have any thoughts about @cyli's comment regarding your new allocator?

Also, the somethingDeallocated field doesn't exist in this branch yet, but we can backport the necessary changes if we think it's important.

I meant to say somethingDeallocated as well, but it's actually added in this PR so I fixed it.

@cyli
Copy link
Copy Markdown
Contributor

cyli commented May 23, 2018

@nishanttotla Oops, sorry, I mistyped. It was somethingWasDeallocated

@nishanttotla
Copy link
Copy Markdown
Contributor Author

Current failure seems like it's #2577.

Ping @anshulpundir

We retry failed allocations every 5 minutes. If something else gets
deallocated, we should trigger the retry immediately in case the
allocations were failing due to IP exhaustion, and the deallocation
freed up an IP.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
(cherry picked from commit e913523)
Signed-off-by: Nishant Totla <nishanttotla@gmail.com>
@anshulpundir
Copy link
Copy Markdown
Contributor

Effectively reverted after docker:bump_v17.06 was reset back to 90897c1/#2584

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants