SegmentLoadDropHandler: Fix deadlock when segments have errors loading on startup.#5735
Merged
b-slim merged 1 commit intoapache:masterfrom May 3, 2018
Merged
Conversation
…g on startup. The "lock" object was used to synchronize start/stop as well as synchronize removals from segmentsToDelete (when a segment is done dropping). This could cause a deadlock if a segment-load throws an exception during loadLocalCache. loadLocalCache is run by start() while it holds the lock, but then it spawns loading threads, and those threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt segments. I don't see any reason for these two locks to be the same lock, so I split them.
jihoonson
approved these changes
May 2, 2018
Contributor
|
Does it make sense to backport to 0.12.1 even though this bug existed even before 0.12.0? |
Contributor
Author
|
I think probably not, since it's not a regression and we already started the RC train. Although if we end up doing an rc2 for some other reason, maybe we could include this one too, since why not. |
Contributor
|
👍 |
Contributor
|
Since we will create another rc for 0.12.1, I think this is also worthwhile to backport. |
gianm
added a commit
to gianm/druid
that referenced
this pull request
May 8, 2018
…g on startup. (apache#5735) The "lock" object was used to synchronize start/stop as well as synchronize removals from segmentsToDelete (when a segment is done dropping). This could cause a deadlock if a segment-load throws an exception during loadLocalCache. loadLocalCache is run by start() while it holds the lock, but then it spawns loading threads, and those threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt segments. I don't see any reason for these two locks to be the same lock, so I split them.
gianm
added a commit
to implydata/druid-public
that referenced
this pull request
May 8, 2018
…g on startup. (apache#5735) The "lock" object was used to synchronize start/stop as well as synchronize removals from segmentsToDelete (when a segment is done dropping). This could cause a deadlock if a segment-load throws an exception during loadLocalCache. loadLocalCache is run by start() while it holds the lock, but then it spawns loading threads, and those threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt segments. I don't see any reason for these two locks to be the same lock, so I split them.
Contributor
Author
|
@jihoonson sounds good to me, it's in #5755. |
Contributor
|
@gianm thanks! |
sathishsri88
pushed a commit
to sathishs/druid
that referenced
this pull request
May 8, 2018
…g on startup. (apache#5735) The "lock" object was used to synchronize start/stop as well as synchronize removals from segmentsToDelete (when a segment is done dropping). This could cause a deadlock if a segment-load throws an exception during loadLocalCache. loadLocalCache is run by start() while it holds the lock, but then it spawns loading threads, and those threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt segments. I don't see any reason for these two locks to be the same lock, so I split them.
jihoonson
pushed a commit
that referenced
this pull request
May 9, 2018
…g on startup. (#5735) (#5755) The "lock" object was used to synchronize start/stop as well as synchronize removals from segmentsToDelete (when a segment is done dropping). This could cause a deadlock if a segment-load throws an exception during loadLocalCache. loadLocalCache is run by start() while it holds the lock, but then it spawns loading threads, and those threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt segments. I don't see any reason for these two locks to be the same lock, so I split them.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the same lock if they want to drop a corrupt segment.
I don't see any reason for these two locks to be the same lock, so I split them.