-
Notifications
You must be signed in to change notification settings - Fork 963
[bk-gc] avoid blocking call in gc-thread #1940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@eolivelli @sijie can you also please review this PR. it's related to #1937 |
| LOG.warn("Failed to read ledger-metadata for {} from zk {}", bkLid, e.getMessage()); | ||
| } | ||
| if (rc != BKException.Code.NoSuchLedgerExistsException) { | ||
| if (isTimeOut || rc != BKException.Code.NoSuchLedgerExistsException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment here doesn't seem to be right to me for isTimeout case. Why not just move line 182 to line 176 in catch BKException block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i have fixed it. I also realized that unfortunately I have made a mistake in timeunit of zkTimeout in ##1937 where caller was passing msec but ZkUtils at receiver was expecting into seconds. So, I have also made the change to accept zk-timeout in ms.
33ee1f7 to
cc9d680
Compare
|
run integration tests |
|
run integration tests |
fix zktiomeout time-unit timeout operation if get zk-children response not came
|
run bookkeeper-server client tests |
|
run integration tests |
|
@sijie can we please merge this PR. |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sure.
You are a committer, you can merge it, you have enough +1s
I will help if you have troubles
I am not committer yet so, I won't be able to merge the PR. |
|
I am really sorry @rdhabalia. |
|
Thanks @eolivelli 👍 |
Motivation
Right now, we have below 3 issues because of which gc thread gets blocked forever and it can't perform gc-task further. Below issues are mainly related to blocking call while doing zk-operation without timeout.
bug-fixes:
right now, GC - ScanAndCompareGarbageCollector passes timeout in millisecond to LedgerManager but it
takes it as second and again try to convert it in millis so, 30Kms timeout becomes 30M ms timeout. Sp, fix timeout unit during gc.
Right now, GC makes blocking call to get list of children on ledger znode and sometime zk-call back doesn't comeback which blocks the gc-thread forever. However, recently we added the timeout on the object-waiting-lock which doesn't work because it's in while loop and
object.wait(timeout)completes without any exception and GC threads keep running in while loop.add zk-timeout during delete ledgers in bookie else it can also block the GC thread.
Changes
add timeout while bk-gc makes zk-call to verify deleted ledgers.