-
Notifications
You must be signed in to change notification settings - Fork 886
Fix for swarm init/leave race condition #1735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| return err | ||
| } | ||
|
|
||
| ch, cancel := nDB.Watch(libnetworkEPTable, "", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was actually a bug here.
At line 319 the ep table cancel function was expected:
epTblCancel: cancel,
83ad580 to
1f379d7
Compare
agent.go
Outdated
| bindAddr string | ||
| advertiseAddr string | ||
| dataPathAddr string | ||
| epTblCancelFunList []func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not only for endpoint tables, it is also for node event tables.
Please rename the variable to coreCancelFuncs. We already have a driverCancelFuncs, so that it is clear this new one contains the cancel functions for the tables followed by libnetwork core
88b3799 to
ed2c555
Compare
|
LGTM |
|
@aboch after a new change on the moby side I realized that there was no reason to change the ListenClusterEvent function so I just restored it. The provider interface again returns a read only channel as the original |
|
Right, because we no longer need to push an event from inside libnetwork (I remember at some point the key ready msg was being sent from inside SetKey) Still LGTM |
This change cleans up the SetClusterProvider method. Swarm calls the SetClusterProvider to pass to libnetwork the pointer of the provider from which libnetwork can fetch all the information to initialize the internal agent. The method can be and is called multiple times passing the same value, with the previous logic that was erroneusly spawning multiple go routines that were making possiblea race between an agentInit and an agentClose. The new logic aims to disallow it by checking for the provider passed and ensuring that if the provider is already present there is nothing to do because there is already an active go routine that is ready to process cluster events. Moreover a patch on moby side takes care of clearing up the Cluster Events dispacthing using only 1 channel to handle all the events types. This will also guarantee in order event handling because now all the events are piped into one single channel. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
mavenugo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change serialize the init/close of the libnetwork agent into one single code path.
On swarm init there are several things that are appening in parallel, before the
initialization of the agent was happening thorugh differents code paths depending
from timing.
This change cleans up the entire init logic letting the init happening in the goroutine
clusterAgentInit.
Moby side: moby/moby#32906
Signed-off-by: Flavio Crisciani flavio.crisciani@docker.com