[18.03] Fix deadlock introduced in bd613df2#2182
Merged
fcrisciani merged 1 commit intoJun 8, 2018
Merged
Conversation
Commit bd613df prevented data corruption due to simultaneous driver.CreateNetwork()/driver.DeleteNetwork() by holding the network lock through the read/modify part of the operation. However, part of the DeleteNetwork operation entails sending a message to the peerDB to tell that goroutine to flush entries on deletion. This can lead to a deadlock where: * driver.DeleteNetwork() starts and acquires driver.Lock() * peerDB receives some other request (e.g. EventNotify) and blocks on driver.Lock() * driver.DeleteNetwork() attempts a peerDB flush and blocks waiting on the synchronous peerDB operation channel This patch fixes the issue by deferring the peerDB flush operation until after DeleteNetwork() unlocks driver.Lock(). Commit bd613df only modified CreateNetwork() and DeleteNetwork() and the critical section that driver.Lock() protects in CreateNetwork() does not perform any peerDB notifications or other locks of driver data structures. So this solution should be a complete fix for any regressions introduced in bd613df. Signed-off-by: Chris Telfer <ctelfer@docker.com> (cherry picked from commit 3755f80) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Codecov Report
@@ Coverage Diff @@
## bump_18.03 #2182 +/- ##
============================================
Coverage ? 40.6%
============================================
Files ? 139
Lines ? 22490
Branches ? 0
============================================
Hits ? 9132
Misses ? 12028
Partials ? 1330
Continue to review full report at Codecov.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
backport of #2180 for the bump_18.03 branch
no conflicts
Commit bd613df prevented data corruption due to simultaneous
driver.CreateNetwork()/driver.DeleteNetwork() by holding the network
lock through the read/modify part of the operation. However, part of
the DeleteNetwork operation entails sending a message to the peerDB to
tell that goroutine to flush entries on deletion. This can lead to a
deadlock where:
on driver.Lock()
on the synchronous peerDB operation channel
This patch fixes the issue by deferring the peerDB flush operation until
after DeleteNetwork() unlocks driver.Lock(). Commit bd613df only
modified CreateNetwork() and DeleteNetwork() and the critical section
that driver.Lock() protects in CreateNetwork() does not perform any
peerDB notifications or other locks of driver data structures. So this
solution should be a complete fix for any regressions introduced in
bd613df.
Signed-off-by: Chris Telfer ctelfer@docker.com
(cherry picked from commit 3755f80)
Signed-off-by: Sebastiaan van Stijn github@gone.nl