Skip to content

Conversation

@sanimej
Copy link

@sanimej sanimej commented Jun 1, 2017

Fixes #1768

Noticed this issue when debugging few test case failures in e2e test suite. This is likely to be the root cause for docker #33076 and the service access issue during container bring up reported in docker #30321

This PR implements a different approach to fix this issue. #1783 has an alternative approach, which could have a performance impact.

libnetwork IPAM recycles the IP address when a task goes down on a node and brought up in another node. For remote tasks overlay network namespace has one static fdb entry programmed by the driver and one dynamic entry learned by the bridge from the data path when a packet is received from the remote container. The dynamic entry ages out after 300 seconds. If a task on a remote node goes down and gets scheduled on a node the dynamic fdb entry still remains. Unless the container generates some data traffic it won't be updated. This can lead to unpredictability in accessing the container; sometimes it will work pretty quickly if there is some traffic from the container and the mac entry gets updated. If the container is completely silent it can lead to upto 300 seconds of traffic loss.

Fix is to delete the dynamic fdb entry as well. But this doesn't work in some kernel versions because untagged fdb entries are assumed to be in default vlan 1. To address this, the default vlan for the bridge has to be set using the sysctl variable.

Signed-off-by: Santhosh Manohar santhosh@docker.com

Signed-off-by: Santhosh Manohar <santhosh@docker.com>
Copy link
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor comments.

LGTM otherwise.

return fmt.Errorf("vxlan interface creation failed for subnet %q: %v", s.subnetIP.String(), err)
}

if !hostMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this specific to non-hostMode ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reexec is to avoid calling the potential blocking sysfs mount operations causing the namespace corruption. Its not needed in the hostmode since there is no per overlay network namespace. But in the hostmode also we might have to set the default-vlan, directly from the daemon without a reexec. I will do that in a separate PR after trying out in the hostmode.

path := filepath.Join("/sys/class/net", brName, "bridge/default_pvid")
data := []byte{'0', '\n'}

if err = ioutil.WriteFile(path, data, 0644); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this works equally well across multiple Distros.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested Ubuntu 15.10 running 4.2 kernel and 3.19 kernel. And Centos running 3.10.0-327.10.1.el7.x86_64.

@mavenugo
Copy link
Contributor

mavenugo commented Jun 5, 2017

Thanks @sanimej

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

service discovery between service task and unmanaged container takes long at times

2 participants