Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions drivers/overlay/ov_network.go
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,23 @@ func (n *network) setupSubnetSandbox(s *subnet, brName, vxlanName string) error

if err := sbox.AddInterface(vxlanName, "vxlan",
sbox.InterfaceOptions().Master(brName)); err != nil {
// If adding vxlan device to the overlay namespace fails, remove the bridge interface we
// already added to the namespace. This allows the caller to try the setup again.
for _, iface := range sbox.Info().Interfaces() {
if iface.SrcName() == brName {
if ierr := iface.Remove(); ierr != nil {
logrus.Errorf("removing bridge failed from ov ns %v failed, %v", n.sbox.Key(), ierr)
}
}
}

// Also, delete the vxlan interface. Since a global vni id is associated
// with the vxlan interface, an orphaned vxlan interface will result in
// failure of vxlan device creation if the vni is assigned to some other
// network.
if deleteErr := deleteInterface(vxlanName); deleteErr != nil {
logrus.Warnf("could not delete vxlan interface, %s, error %v, after config error, %v\n", vxlanName, deleteErr, err)
}
return fmt.Errorf("vxlan interface creation failed for subnet %q: %v", s.subnetIP.String(), err)
}

Expand Down Expand Up @@ -601,6 +618,9 @@ func (n *network) initSubnetSandbox(s *subnet, restore bool) error {
}
} else {
if err := n.setupSubnetSandbox(s, brName, vxlanName); err != nil {
// The error in setupSubnetSandbox could be a temporary glitch. reset the
// subnet once object to allow the setup to be retried on another endpoint join.
s.once = &sync.Once{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense for initSubnetSandbox. How does the restore path work ?
I have to spend some time analyzing that as well to confirm if this change is complete.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanimej Reviewing the restore path, I think if restoreSubnetSandbox fails for any reason, I think we will end up in the same issue.
And the current fix will not help, because your changes currently performs deleteInterface for vxlan device only if AddInterface fails. But if the restoreSubnetSandbox fails, then the failure will happen in createVxlan call and there is no recovery for that.

Do you think it is valid ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mavenugo The way restore works is fundamentally different. For the restore case we do handle something similar, but at the network level; https://github.com/docker/libnetwork/blob/master/drivers/overlay/overlay.go#L99

Going strictly by the code, yes, restoreSubnetSandbox can also fail. But all the recent issues reported for this error are in swarm mode (ie: not the restore code path). For the restore case we have to try with a forced error and verify things end to end. It belongs in a separate PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Thanks.

return err
}
}
Expand Down
10 changes: 10 additions & 0 deletions osl/interface_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,16 @@ func (n *networkNamespace) AddInterface(srcName, dstPrefix string, options ...If

// Configure the interface now this is moved in the proper namespace.
if err := configureInterface(nlh, iface, i); err != nil {
// If configuring the device fails move it back to the host namespace
// and change the name back to the source name. This allows the caller
// to properly cleanup the interface. Its important especially for
// interfaces with global attributes, ex: vni id for vxlan interfaces.
if nerr := nlh.LinkSetName(iface, i.SrcName()); nerr != nil {
logrus.Errorf("renaming interface (%s->%s) failed, %v after config error %v\n", i.DstName(), i.SrcName(), nerr, err)
}
if nerr := nlh.LinkSetNsFd(iface, ns.ParseHandlerInt()); nerr != nil {
logrus.Errorf("moving inteface %s to host ns failed, %v, after config error %v\n", i.SrcName(), nerr, err)
}
return err
}

Expand Down