Skip to content

Conversation

@sanimej
Copy link

@sanimej sanimej commented May 28, 2017

Fixes #1768

Noticed this issue when debugging few test case failures in e2e test suite. This is likely to be the root cause for docker #33076 and the service access issue during container bring up reported in docker #30321

libnetwork IPAM recycles the IP address when a task goes down on a node and brought up in another node. For remote tasks overlay network namespace has one static fdb entry programmed by the driver and one dynamic entry learned by the bridge from the data path when a packet is received from the remote container. The dynamic entry ages out after 300 seconds. If a task on a remote node goes down and gets scheduled on a node the dynamic fdb entry still remains. Unless the container generates some data traffic it won't be updated. This can lead to unpredictability in accessing the container; sometimes it will work pretty quickly if there is some traffic from the container and the mac entry gets updated. If the container is completely silent it can lead to upto 300 seconds of traffic loss.

Change in this PR is to disable the mac learning in the bridge. This is done only for the vxlan interface. For local veth interfaces we rely on mac learning for container to container communication.

There is one caveat with this approach: When there are many local containers traffic to a remote task would get replicated to all those local endpoints. This can potentially be a performance issue. An alternative approach is to delete the dynamic mac entry in the bridge. But in some older kernels deleting the dynamic entry doesn't work because of the default vlan issue. It might still be possible to handle that case by setting the default bridge vlan. If that is feasible this fix can be changed to use that approach.

Signed-off-by: Santhosh Manohar santhosh@docker.com

Signed-off-by: Santhosh Manohar <santhosh@docker.com>
@mavenugo
Copy link
Contributor

code LGTM.

@sanimej regarding the caveat that you mentioned,

When there are many local containers traffic to a remote task would get replicated to all those local endpoints.

isnt that true only in the case of remote static entry is not programmed ? And for any such container movement, the control-plane must converge and the static entry will be programmed across the nodes and hence such flooding is a temporary behavior till the entry appears. Is that correct ?

@sanimej
Copy link
Author

sanimej commented May 28, 2017

@mavenugo A local container's traffic hits the overlay bridge first. So without the learned entry in the bridge it would get replicated to all the ports, including the vxlan port. So I would expect the replication to happen always.

@mavenugo
Copy link
Contributor

@sanimej in that case, is it safe to add a static fdb entry on the bridge as well and remove it when the container moves (just like we do it on the vxlan device ?).

@abhi
Copy link
Contributor

abhi commented May 30, 2017

@mavenugo I think adding a static fdb entry on the bridge as well will prevent flooding in silent/consumer-only container scenario too.

@sanimej
Copy link
Author

sanimej commented Jun 1, 2017

@mavenugo We don't have to add a static fdb entry. We just to have to delete the dynamic entry when the task goes down. But this doesn't work in some kernel versions because of the default vlan being 1. #1792 PR has a fix for this by setting the bridge default vlan to 0.


func (i *nwIface) DisableLearning() bool {
i.Lock()
i.Unlock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this meant to be deferred? The code doesn't look right as-is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should have been defer. Thanks.

As mentioned in the description the flooding to the local ports is better avoided. I have pushed a PR with a different approach to address this issue, #1792. I will close this PR.

@sanimej
Copy link
Author

sanimej commented Jun 2, 2017

#1792 fixes the problem without the drawback mentioned in this PR's description. Closing this.

@sanimej sanimej closed this Jun 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

service discovery between service task and unmanaged container takes long at times

4 participants