Skip to content

Adding logic to restore networks in order#2571

Merged
dperny merged 1 commit into
moby:masterfrom
abhi:master
Mar 29, 2018
Merged

Adding logic to restore networks in order#2571
dperny merged 1 commit into
moby:masterfrom
abhi:master

Conversation

@abhi
Copy link
Copy Markdown
Contributor

@abhi abhi commented Mar 23, 2018

This commits adds a fix for restore case where there might a mix of allocated and unallocated
network in raft. During restore the allocator was going over the networks lexicographically which would mean that there might be a chance for an unallocated network say net1 o be allocated the same vxlan id or subnet pool that was allocated to another networki net2. Because of this during restoring, when allocator tries to allocate the reallocate network net2, it would fail because it allocated network resources to net1 during restore. This would mean services,tasks and network itself would be in a
messed up state.
This commit also contains unit test and some test case modification that was not catching the issue before.

Signed-off-by: Abhinandan abhi@docker.com

@abhi
Copy link
Copy Markdown
Contributor Author

abhi commented Mar 23, 2018

continue
}

if existingOnly &&
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment here stating the importance that both DriverState and IPAM are written together, else the restore logic will fail again

func (na *cnmNetworkAllocator) IsAllocated(n *api.Network) bool {
_, ok := na.networks[n.ID]
return ok
if _, ok := na.networks[n.ID]; !ok {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not look needed as a change

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this isn't needed, this is the same as return ok.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right right. I had added some more logic to check for allocation. I removed it and did not revert this.


if err := a.store.Batch(func(batch *store.Batch) error {
for _, n := range allocatedNetworks {
if err := a.commitAllocatedNetwork(ctx, batch, n); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can add here a check that a network to be committed must have IPAM and DriverState both != nil, this will put more guarantees on the previous check

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going skip this coz its used in 2 places where the networks are "rightly" allocated now.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2018

Codecov Report

Merging #2571 into master will increase coverage by 0.06%.
The diff coverage is 62.85%.

@@            Coverage Diff             @@
##           master    #2571      +/-   ##
==========================================
+ Coverage   61.42%   61.48%   +0.06%     
==========================================
  Files         134      134              
  Lines       21722    21817      +95     
==========================================
+ Hits        13342    13414      +72     
- Misses       6945     6946       +1     
- Partials     1435     1457      +22

@dperny
Copy link
Copy Markdown
Collaborator

dperny commented Mar 26, 2018

LGTM

Copy link
Copy Markdown
Contributor

@anshulpundir anshulpundir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an unallocated network say net1 o be allocated the same vxlan id or subnet pool that was allocated to another networki net2. Because of this during restoring, when allocator tries to allocate the reallocate network net2, it would fail because it allocated network resources to net1 during restore. This would mean services,tasks and network itself would be in a
messed up state.

For the unallocated net1, is there no way to check that the vxlan id or subnet pool being assigned was already assigned to net2 and avoid re-assigning it ? @abhi

Comment thread manager/allocator/network.go Outdated
return nil
}

// allocateNetworks allocates networks in the store so far before we process
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store so far before we process watched events

Sorry, not sure what you mean by this.

Also, please add a comment for how the existingOnly flag is used.

return err
}

if err := a.allocateNetworks(ctx, false); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment on why we do it in this order i.e. existingOnly=true first and existingOnly=false later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are added in detail in allocateNetworks and the usage of existingOnly flags

@abhi
Copy link
Copy Markdown
Contributor Author

abhi commented Mar 26, 2018

@anshulpundir We can check and it will be a O(n^2) op and its not needed. In this case we only process networks once.

if nc.nwkAllocator.IsAllocated(n) {
continue
}
// Network is considered allocated only if the DriverState and IPAM are NOT nil.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have explained it here

@abhi abhi force-pushed the master branch 3 times, most recently from 9bdbb7a to 165404f Compare March 27, 2018 16:49
@abhi
Copy link
Copy Markdown
Contributor Author

abhi commented Mar 27, 2018

@anshulpundir elaborated some comments

@anshulpundir
Copy link
Copy Markdown
Contributor

We can check and it will be a O(n^2) op and its not needed. In this case we only process networks once.

Got it thx for clarifying!

Comment thread manager/allocator/network.go Outdated
}

// allocateNetworks allocates (restores) networks in the store so far before we process
// watched events. existingOnly flas is set to true to specify if only allocated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flas => flag

return err
}
// Now allocated objects that were not previously allocated
// but were present in the raft.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Thx for adding this, but it might make sense to add you description from the PR here in some form. Maybe: "we allocate previously unallocated objects after already handling the objects which have been previously partially allocated to first finish allocating those partially allocated objects (to avoid introducing conflicts) and then do the allocations for new objects.

This commits adds a fix for restore case where
there might a mix of allocated and unallocated
network in raft. During restore the allocator
was going over the networks lexicographically which
would mean that there might be a chance for an
unallocated network say net1 o be allocated the same vxlan
id or subnet pool that was allocated to another networki net2.
Because of this during restoring, when allocator tries
to allocate the reallocate network net2, it would fail
because it allocated network resources to net1 during restore.
This would mean services,tasks and network itself would be in a
messed up state.

Signed-off-by: Abhinandan <abhi@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants