[Backport 17.12] Added required call to allocate VIPs when endpoints are restored by fcrisciani · Pull Request #2472 · moby/swarmkit

fcrisciani · 2017-12-15T19:11:58Z

From original PR (#2468):
Tracking down libnetwork/1790#issuecomment-308222053. The report is that if on a single node several services are started, and if this node is then rebooted, all the services appear to come back but some of them are no longer reachable.

On probing, the cause turned out to be an invalid assignment of IP addresses to services when they were restored. Specifically, the same IP address was assigned to one service's VIP and also a different service's endpoint. The result was that packets got delivered to the wrong container and caused symptoms like services or ports unreachable.

This is very likely to also be the cause of moby/#35675 and other duplicate-IP or overlapping IP issues.

The reason for this problem seems to be that the code path followed when services are restored, at no point contacts or informs IPAM about the IP addresses used as the restored service's VIP. So IPAM thinks that those IP addresses are still available and hands them out to endpoints and new services, causing the observed chaos.

To work out the right fix, I compared the code path when a service is created from the CLI to the code path when a service is restored on reboot. To me this fix is the bit that should have always been on the restore path but was omitted. With this fix IPAM gets correctly informed and it's state is consistent with what I see on the network.

I have tested this fix on a single node running several services and when there multiple nodes with multiple managers running many services (specifically 2 nodes and 2 managers). In both cases, without the fix a reboot would cause IP address overlaps on the ingress network. With the fix there are no overlaps.

While the fix seems to work, I'm not sure if it is at exactly the right point in this function, or indeed if it is the right or complete fix. Please take a look and let me know.

On leader change or leader reboot the restore logic in the allocator was allocating overlapping IP address for VIPs and Endpoints in the ingress network. The fix added as part of this commit ensures that during restore we allocate the existing VIP and endpoint. Signed-off-by: Balraj Singh <balrajsingh@ieee.org> (cherry picked from commit 5fd25d2)

fcrisciani · 2017-12-15T19:12:18Z

@balrajsingh @abhi @dperny for final blessing

abhi

LGTM

Signed-off-by: Balraj Singh <balrajsingh@ieee.org> (cherry picked from commit 2397ddf)

codecov · 2017-12-15T19:33:40Z

Codecov Report

Merging #2472 into bump_v17.12 will increase coverage by 5.52%.
The diff coverage is 0%.

@@               Coverage Diff               @@
##           bump_v17.12    #2472      +/-   ##
===============================================
+ Coverage         61.8%   67.32%   +5.52%     
===============================================
  Files              128       80      -48     
  Lines            21107    11487    -9620     
===============================================
- Hits             13045     7734    -5311     
+ Misses            6664     2963    -3701     
+ Partials          1398      790     -608

fcrisciani · 2017-12-15T19:57:38Z

CI error:

time="2017-12-15T19:40:14Z" level=error msg="task allocation failure" error="failed to retrieve network testID3 while allocating task testTaskID3" 
time="2017-12-15T19:40:15Z" level=error msg="Failed allocation for network testID5" error="failed while allocating driver state for network testID5: could not obtain vxlan id for pool 10.0.4.0/24: requested bit is already allocated" 
time="2017-12-15T19:40:15Z" level=error msg="task allocation failure" error="network testID5 attached to task testTaskID6 not allocated yet" 
time="2017-12-15T19:40:15Z" level=error msg="Failed allocation for service testServiceID4" error="requested bit is already allocated" 
time="2017-12-15T19:40:15Z" level=error msg="task allocation failure" error="service testServiceID4 to which this task testTaskID7 belongs has pending allocations" 
--- FAIL: TestNodeAllocator (0.02s)
	allocator_test.go:1047: timed out before watchNode found expected node state
FAIL
coverage: 71.5% of statements
FAIL	github.com/docker/swarmkit/manager/allocator	4.167s

Tried a couple of times, giving up, looks like was also failing on master

dperny · 2017-12-15T20:31:27Z

when i merged the original PR all tests were succeeding...

fcrisciani · 2017-12-15T20:50:22Z

@dperny yeah but as far as I know @balrajsingh triggered the CI several times

fcrisciani · 2017-12-15T20:53:50Z

@balrajsingh can you take a look? is it possible that is an artifact of running the tests in parallel?

balrajsingh · 2017-12-18T14:21:00Z

@fcrisciani This particular test (TestNodeAllocator) seems to pass sometimes and fail at other times on the CI.
On my local machine it seems to mostly fail, but it also fails when I go back to previous versions. I've checked back till about early Oct and in my env this test is currently failing in all of them.
I also tried removing all other tests in that file but the test still fails, though without the allocation failure message.

abhi · 2017-12-18T15:50:23Z

@balrajsingh I looked at the test cases. It looks like this error is expected.

time="2017-12-15T19:40:14Z" level=error msg="task allocation failure" error="failed to retrieve network testID3 while allocating task testTaskID3" 
time="2017-12-15T19:40:15Z" level=error msg="Failed allocation for network testID5" error="failed while allocating driver state for network testID5: could not obtain vxlan id for pool 10.0.4.0/24: requested bit is already allocated" 
time="2017-12-15T19:40:15Z" level=error msg="task allocation failure" error="network testID5 attached to task testTaskID6 not allocated yet" 
time="2017-12-15T19:40:15Z" level=error msg="Failed allocation for service testServiceID4" error="requested bit is already allocated" 
time="2017-12-15T19:40:15Z" level=error msg="task allocation failure" error="service testServiceID4 to which this task testTaskID7 belongs has pending allocations"

So definitely thats not an issue.If you run go test -run TestNodeAllocator, you see the test failing ? I just pulled the master and ran a bunch of times and I don't see any test failing.

seemethere · 2017-12-19T18:15:03Z

This can probably be closed

fcrisciani · 2017-12-19T18:28:55Z

merged as another PR

abhi approved these changes Dec 15, 2017

View reviewed changes

Added unit test TestAllocatorRestoreForDuplicateIPs

da24fae

Signed-off-by: Balraj Singh <balrajsingh@ieee.org> (cherry picked from commit 2397ddf)

fcrisciani force-pushed the backport_17.06 branch from 15688d5 to da24fae Compare December 15, 2017 19:33

abhi mentioned this pull request Dec 18, 2017

[Backport 17.12] Added required call to allocate VIPs when endpoints are restored #2474

Merged

fcrisciani closed this Dec 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backport 17.12] Added required call to allocate VIPs when endpoints are restored#2472

[Backport 17.12] Added required call to allocate VIPs when endpoints are restored#2472
fcrisciani wants to merge 2 commits into
moby:bump_v17.12from
fcrisciani:backport_17.06

fcrisciani commented Dec 15, 2017 •

edited

Loading

Uh oh!

fcrisciani commented Dec 15, 2017

Uh oh!

abhi left a comment

Uh oh!

codecov Bot commented Dec 15, 2017

Uh oh!

fcrisciani commented Dec 15, 2017

Uh oh!

dperny commented Dec 15, 2017

Uh oh!

fcrisciani commented Dec 15, 2017

Uh oh!

fcrisciani commented Dec 15, 2017

Uh oh!

balrajsingh commented Dec 18, 2017

Uh oh!

abhi commented Dec 18, 2017

Uh oh!

seemethere commented Dec 19, 2017

Uh oh!

fcrisciani commented Dec 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

fcrisciani commented Dec 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fcrisciani commented Dec 15, 2017

Uh oh!

abhi left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Dec 15, 2017

Codecov Report

Uh oh!

fcrisciani commented Dec 15, 2017

Uh oh!

dperny commented Dec 15, 2017

Uh oh!

fcrisciani commented Dec 15, 2017

Uh oh!

fcrisciani commented Dec 15, 2017

Uh oh!

balrajsingh commented Dec 18, 2017

Uh oh!

abhi commented Dec 18, 2017

Uh oh!

seemethere commented Dec 19, 2017

Uh oh!

fcrisciani commented Dec 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fcrisciani commented Dec 15, 2017 •

edited

Loading