-
Notifications
You must be signed in to change notification settings - Fork 886
Fix for crash on WaitGroup #1857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, holding a lock around the calls that adds the vtep neighbor entries needs to be done carefully.
am not against this change. But It needs some more eyes. @sanimej wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats the part I am looking at as well. Usage of the WaitGroup needs to be fixed. But its better to avoid taking a lock around peerAdd which is a pretty expensive call, with multiple netlink calls to the kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fcrisciani @mavenugo
peerDbWg.Addis only in the sandbox init path. In the pre-swarm mode overlay using serf there is no network scoped joining of the cluster. So what used to happen was all the peer events for a network which doesn't have local tasks is stored in the peerDB. When the first container comes up on the networkpeerDbUpdateSandboxwould program all those entries into the overlay network. If we didn't do this, till the next self pushpull there won't be connectivity to the peers. I think the only reason peerDbWg is needed is to make sure this one shotpeerDbUpdateSandboxis done before we start processing the serf events. For ex: the serf could be for a delete of an existing endpoint that we received earlier and in the peerDB.With the networkdb in swarm mode since the network gets recreated on a task run on the node and get all the events from the peers, I don't see a strong reason for using the peerDbWg. One option here is to add do make the peerDbWg calls (Add/Wait/Done) conditional and only for non-swarm mode so that we won't change the flow for the non-swarm mode and avoid the panic for the swarm mode.
For the non-swarm mode the panic is still possible. But there is very little usage for it now and we can take some time to fix it in a different approach (or may be we might even deprecate that mode) rather than completely removing that logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mavenugo On the locking, the scope of the lock is changed only in
peerDbUpdateSandboxwhich was already serialized using the WaitGroup. So that looks ok. As I mentioned in the earlier comment its good to keep the old logic for serf mode till its deprecated.