Skip to content

SegmentLoadDropHandler: Fix deadlock when segments have errors loading on startup.#5735

Merged
b-slim merged 1 commit intoapache:masterfrom
gianm:fix-histo-startup-log
May 3, 2018
Merged

SegmentLoadDropHandler: Fix deadlock when segments have errors loading on startup.#5735
b-slim merged 1 commit intoapache:masterfrom
gianm:fix-histo-startup-log

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented May 2, 2018

The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the same lock if they want to drop a corrupt segment.

I don't see any reason for these two locks to be the same lock, so I split them.

…g on startup.

The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt
segments.

I don't see any reason for these two locks to be the same lock, so I split them.
@gianm gianm added the Bug label May 2, 2018
@jihoonson
Copy link
Copy Markdown
Contributor

Does it make sense to backport to 0.12.1 even though this bug existed even before 0.12.0?

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented May 2, 2018

I think probably not, since it's not a regression and we already started the RC train. Although if we end up doing an rc2 for some other reason, maybe we could include this one too, since why not.

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented May 3, 2018

👍

@b-slim b-slim merged commit df01998 into apache:master May 3, 2018
@jihoonson
Copy link
Copy Markdown
Contributor

Since we will create another rc for 0.12.1, I think this is also worthwhile to backport.

gianm added a commit to gianm/druid that referenced this pull request May 8, 2018
…g on startup. (apache#5735)

The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt
segments.

I don't see any reason for these two locks to be the same lock, so I split them.
gianm added a commit to implydata/druid-public that referenced this pull request May 8, 2018
…g on startup. (apache#5735)

The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt
segments.

I don't see any reason for these two locks to be the same lock, so I split them.
@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented May 8, 2018

@jihoonson sounds good to me, it's in #5755.

@gianm gianm deleted the fix-histo-startup-log branch May 8, 2018 17:56
@jihoonson
Copy link
Copy Markdown
Contributor

@gianm thanks!

sathishsri88 pushed a commit to sathishs/druid that referenced this pull request May 8, 2018
…g on startup. (apache#5735)

The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt
segments.

I don't see any reason for these two locks to be the same lock, so I split them.
jihoonson pushed a commit that referenced this pull request May 9, 2018
…g on startup. (#5735) (#5755)

The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt
segments.

I don't see any reason for these two locks to be the same lock, so I split them.
@jihoonson jihoonson added this to the 0.12.1 milestone Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants