Skip to content

Conversation

@rdhabalia
Copy link
Contributor

@rdhabalia rdhabalia commented Feb 6, 2019

Motivation

Right now, we have below 3 issues because of which gc thread gets blocked forever and it can't perform gc-task further. Below issues are mainly related to blocking call while doing zk-operation without timeout.

bug-fixes:

  1. right now, GC - ScanAndCompareGarbageCollector passes timeout in millisecond to LedgerManager but it
    takes it as second and again try to convert it in millis so, 30Kms timeout becomes 30M ms timeout. Sp, fix timeout unit during gc.

  2. Right now, GC makes blocking call to get list of children on ledger znode and sometime zk-call back doesn't comeback which blocks the gc-thread forever. However, recently we added the timeout on the object-waiting-lock which doesn't work because it's in while loop and object.wait(timeout) completes without any exception and GC threads keep running in while loop.

  3. add zk-timeout during delete ledgers in bookie else it can also block the GC thread.

Changes

add timeout while bk-gc makes zk-call to verify deleted ledgers.

@rdhabalia
Copy link
Contributor Author

@eolivelli @sijie can you also please review this PR. it's related to #1937

LOG.warn("Failed to read ledger-metadata for {} from zk {}", bkLid, e.getMessage());
}
if (rc != BKException.Code.NoSuchLedgerExistsException) {
if (isTimeOut || rc != BKException.Code.NoSuchLedgerExistsException) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment here doesn't seem to be right to me for isTimeout case. Why not just move line 182 to line 176 in catch BKException block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i have fixed it. I also realized that unfortunately I have made a mistake in timeunit of zkTimeout in ##1937 where caller was passing msec but ZkUtils at receiver was expecting into seconds. So, I have also made the change to accept zk-timeout in ms.

@rdhabalia rdhabalia force-pushed the verify_gc branch 3 times, most recently from 33ee1f7 to cc9d680 Compare March 4, 2019 19:08
@rdhabalia rdhabalia changed the title [bk-gc] avoid blocking call while gc checking deleted ledger znode [bk-gc] avoid blocking call in gc-thread May 13, 2019
@rdhabalia
Copy link
Contributor Author

run integration tests
run bookkeeper-server client tests
run bookkeeper-server remaining tests

@rdhabalia
Copy link
Contributor Author

run integration tests

fix zktiomeout time-unit

timeout operation if get zk-children response not came
@rdhabalia
Copy link
Contributor Author

run bookkeeper-server client tests

@rdhabalia
Copy link
Contributor Author

run integration tests

@rdhabalia
Copy link
Contributor Author

@sijie can we please merge this PR.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Sure.
You are a committer, you can merge it, you have enough +1s
I will help if you have troubles

@rdhabalia
Copy link
Contributor Author

You are a committer, you can merge it, you have enough +1s

I am not committer yet so, I won't be able to merge the PR.

@eolivelli
Copy link
Contributor

I am really sorry @rdhabalia.
Merging now
Thank you very much for your contribution

@eolivelli eolivelli added this to the 4.10.0 milestone May 15, 2019
@eolivelli eolivelli merged commit f5ddc36 into apache:master May 15, 2019
@rdhabalia rdhabalia deleted the verify_gc branch May 15, 2019 06:36
@rdhabalia
Copy link
Contributor Author

Thanks @eolivelli 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants