Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 1 addition & 9 deletions drivers/overlay/peerdb.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,6 @@ func (pKey *peerKey) Scan(state fmt.ScanState, verb rune) error {
return nil
}

var peerDbWg sync.WaitGroup

func (d *driver) peerDbWalk(f func(string, *peerKey, *peerEntry) bool) error {
d.peerDb.Lock()
nids := []string{}
Expand Down Expand Up @@ -141,8 +139,6 @@ func (d *driver) peerDbSearch(nid string, peerIP net.IP) (net.HardwareAddr, net.
func (d *driver) peerDbAdd(nid, eid string, peerIP net.IP, peerIPMask net.IPMask,
peerMac net.HardwareAddr, vtep net.IP, isLocal bool) {

peerDbWg.Wait()

d.peerDb.Lock()
pMap, ok := d.peerDb.mp[nid]
if !ok {
Expand Down Expand Up @@ -173,7 +169,6 @@ func (d *driver) peerDbAdd(nid, eid string, peerIP net.IP, peerIPMask net.IPMask

func (d *driver) peerDbDelete(nid, eid string, peerIP net.IP, peerIPMask net.IPMask,
peerMac net.HardwareAddr, vtep net.IP) peerEntry {
peerDbWg.Wait()

d.peerDb.Lock()
pMap, ok := d.peerDb.mp[nid]
Expand Down Expand Up @@ -215,8 +210,6 @@ func (d *driver) peerDbUpdateSandbox(nid string) {
}
d.peerDb.Unlock()

peerDbWg.Add(1)

var peerOps []func()
pMap.Lock()
for pKeyStr, pEntry := range pMap.mp {
Expand Down Expand Up @@ -244,13 +237,12 @@ func (d *driver) peerDbUpdateSandbox(nid string) {

peerOps = append(peerOps, op)
}
pMap.Unlock()

for _, op := range peerOps {
op()
}
pMap.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, holding a lock around the calls that adds the vtep neighbor entries needs to be done carefully.
am not against this change. But It needs some more eyes. @sanimej wdyt ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats the part I am looking at as well. Usage of the WaitGroup needs to be fixed. But its better to avoid taking a lock around peerAdd which is a pretty expensive call, with multiple netlink calls to the kernel.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fcrisciani @mavenugo peerDbWg.Add is only in the sandbox init path. In the pre-swarm mode overlay using serf there is no network scoped joining of the cluster. So what used to happen was all the peer events for a network which doesn't have local tasks is stored in the peerDB. When the first container comes up on the network peerDbUpdateSandbox would program all those entries into the overlay network. If we didn't do this, till the next self pushpull there won't be connectivity to the peers. I think the only reason peerDbWg is needed is to make sure this one shot peerDbUpdateSandbox is done before we start processing the serf events. For ex: the serf could be for a delete of an existing endpoint that we received earlier and in the peerDB.

With the networkdb in swarm mode since the network gets recreated on a task run on the node and get all the events from the peers, I don't see a strong reason for using the peerDbWg. One option here is to add do make the peerDbWg calls (Add/Wait/Done) conditional and only for non-swarm mode so that we won't change the flow for the non-swarm mode and avoid the panic for the swarm mode.

For the non-swarm mode the panic is still possible. But there is very little usage for it now and we can take some time to fix it in a different approach (or may be we might even deprecate that mode) rather than completely removing that logic.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mavenugo On the locking, the scope of the lock is changed only in peerDbUpdateSandbox which was already serialized using the WaitGroup. So that looks ok. As I mentioned in the earlier comment its good to keep the old logic for serf mode till its deprecated.


peerDbWg.Done()
}

func (d *driver) peerAdd(nid, eid string, peerIP net.IP, peerIPMask net.IPMask,
Expand Down