-
Notifications
You must be signed in to change notification settings - Fork 883
Fix race conditions in the overlay network driver #2143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -125,8 +125,12 @@ func (d *driver) NetworkAllocate(id string, option map[string]string, ipV4Data, | |
| opts[netlabel.OverlayVxlanIDList] = val | ||
|
|
||
| d.Lock() | ||
| defer d.Unlock() | ||
| if _, ok := d.networks[id]; ok { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to do this check (and error out) up above in the beginning of the function rather than towards the end?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good suggestion. Hrm... What prefaces the lock itself is basically input checking. Some of it (the VxlanID allocation and possibly cleanup on error) would access the store and could be lengthy, so holding the lock would slow down other network create operations on different networks. But moving the check to the top would prevent concurrent attempts to create the same network from going through needless effort (and seems more sane stylistically). So ... bit nicer hygine vs potentially increased parallelism (assuming that the client is actually using said parallelism). I could go either way on this. What do you think?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was mainly looking at it from the stylistically sane perspective and error out early. Agree with the concern around potential hit in parallelism too. So perhaps okay to leave it the way you have now. |
||
| n.releaseVxlanID() | ||
| return nil, fmt.Errorf("network %s already exists", id) | ||
| } | ||
| d.networks[id] = n | ||
| d.Unlock() | ||
|
|
||
| return opts, nil | ||
| } | ||
|
|
@@ -137,8 +141,8 @@ func (d *driver) NetworkFree(id string) error { | |
| } | ||
|
|
||
| d.Lock() | ||
| defer d.Unlock() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actual implementation locks and unlocks only at particular places. And the current change locks at the beginning of the function and unlocks when the function exits. Like you mentioned , will there be any issue for other threads who will be waiting little longer period than it used to be ? Just trying to see if there will be any trade off in performance or some other issue.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There could be a tradeoff in performance here. The only extra operation that gets protected by the lock is network.releaseVxlanID() which does two things:
The first would be protected against concurrency by the lock anyway. The second, however, invokes the IDM which uses the bitseq package which can end up writing to store. That last bit is the only long blocking operation I can think of here. Even so, this is an "ovmanager.driver" instance and the only operations it locks on are creation and deletion. I'd assume that our desired rate for creation/deletion would be measured in the order of thousands per second, in which case I highly doubt a serial write-per-op would hurt us. But (full transparency), I chafe at bad concurrency patterns and want to make it easy to reason about the correctness of the code. As I said in the commit log for this section: I doubt this race is serious. |
||
| n, ok := d.networks[id] | ||
| d.Unlock() | ||
|
|
||
| if !ok { | ||
| return fmt.Errorf("overlay network with id %s not found", id) | ||
|
|
@@ -147,9 +151,7 @@ func (d *driver) NetworkFree(id string) error { | |
| // Release all vxlan IDs in one shot. | ||
| n.releaseVxlanID() | ||
|
|
||
| d.Lock() | ||
| delete(d.networks, id) | ||
| d.Unlock() | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this change give any specific improvement other than not using swap variable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just that this is more idiomatic Go as far as I'm aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.