Skip to content

[18.03] Fix deadlock introduced in bd613df2#2182

Merged
fcrisciani merged 1 commit into
moby:bump_18.03from
thaJeztah:18.03-backport-fix-overlay-deadlock-regression
Jun 8, 2018
Merged

[18.03] Fix deadlock introduced in bd613df2#2182
fcrisciani merged 1 commit into
moby:bump_18.03from
thaJeztah:18.03-backport-fix-overlay-deadlock-regression

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

backport of #2180 for the bump_18.03 branch

git checkout -b 18.03-backport-fix-overlay-deadlock-regression upstream/bump_18.03
git cherry-pick -s -S -x 3755f80aa0b178acd0447585539a5c13072fe85e
git push -u origin

no conflicts

Commit bd613df prevented data corruption due to simultaneous
driver.CreateNetwork()/driver.DeleteNetwork() by holding the network
lock through the read/modify part of the operation. However, part of
the DeleteNetwork operation entails sending a message to the peerDB to
tell that goroutine to flush entries on deletion. This can lead to a
deadlock where:

  • driver.DeleteNetwork() starts and acquires driver.Lock()
  • peerDB receives some other request (e.g. EventNotify) and blocks
    on driver.Lock()
  • driver.DeleteNetwork() attempts a peerDB flush and blocks waiting
    on the synchronous peerDB operation channel

This patch fixes the issue by deferring the peerDB flush operation until
after DeleteNetwork() unlocks driver.Lock(). Commit bd613df only
modified CreateNetwork() and DeleteNetwork() and the critical section
that driver.Lock() protects in CreateNetwork() does not perform any
peerDB notifications or other locks of driver data structures. So this
solution should be a complete fix for any regressions introduced in
bd613df.

Signed-off-by: Chris Telfer ctelfer@docker.com
(cherry picked from commit 3755f80)
Signed-off-by: Sebastiaan van Stijn github@gone.nl

Commit bd613df prevented data corruption due to simultaneous
driver.CreateNetwork()/driver.DeleteNetwork() by holding the network
lock through the read/modify part of the operation.  However, part of
the DeleteNetwork operation entails sending a message to the peerDB to
tell that goroutine to flush entries on deletion.  This can lead to a
deadlock where:
  * driver.DeleteNetwork() starts and acquires driver.Lock()
  * peerDB receives some other request (e.g. EventNotify) and blocks
    on driver.Lock()
  * driver.DeleteNetwork() attempts a peerDB flush and blocks waiting
    on the synchronous peerDB operation channel

This patch fixes the issue by deferring the peerDB flush operation until
after DeleteNetwork() unlocks driver.Lock().   Commit bd613df only
modified CreateNetwork() and DeleteNetwork() and the critical section
that driver.Lock() protects in CreateNetwork() does not perform any
peerDB notifications or other locks of driver data structures.  So this
solution should be a complete fix for any regressions introduced in
bd613df.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
(cherry picked from commit 3755f80)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Copy link
Copy Markdown
Contributor

@abhi abhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fcrisciani fcrisciani merged commit 557bac2 into moby:bump_18.03 Jun 8, 2018
@codecov-io
Copy link
Copy Markdown

Codecov Report

❗ No coverage uploaded for pull request base (bump_18.03@3931ba4). Click here to learn what that means.
The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff              @@
##             bump_18.03   #2182   +/-   ##
============================================
  Coverage              ?   40.6%           
============================================
  Files                 ?     139           
  Lines                 ?   22490           
  Branches              ?       0           
============================================
  Hits                  ?    9132           
  Misses                ?   12028           
  Partials              ?    1330
Impacted Files Coverage Δ
drivers/overlay/ov_network.go 2.77% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3931ba4...8c6376f. Read the comment docs.

@thaJeztah thaJeztah deleted the 18.03-backport-fix-overlay-deadlock-regression branch June 8, 2018 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants