Skip to content

Conversation

@rdhabalia
Copy link
Contributor

@rdhabalia rdhabalia commented Feb 6, 2019

Motivation

It addresses below thread-stuck while performing gc in bookie.

Thread 3363: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
 - org.apache.bookkeeper.util.ZkUtils.getChildrenInSingleNode(org.apache.zookeeper.ZooKeeper, java.lang.String) @bci=34, line=243 (Compiled frame)
 - org.apache.bookkeeper.meta.LongHierarchicalLedgerManager$LongHierarchicalLedgerRangeIterator.getChildrenAt(java.lang.String) @bci=8, line=165 (Compiled fr
ame)
 - org.apache.bookkeeper.meta.LongHierarchicalLedgerManager$LongHierarchicalLedgerRangeIterator$LeafIterator.<init>(org.apache.bookkeeper.meta.LongHierarchic
alLedgerManager$LongHierarchicalLedgerRangeIterator, java.lang.String) @bci=11, line=187 (Compiled frame)
 - org.apache.bookkeeper.meta.LongHierarchicalLedgerManager$LongHierarchicalLedgerRangeIterator$InnerIterator.advance() @bci=137, line=261 (Compiled frame)
 - org.apache.bookkeeper.meta.LongHierarchicalLedgerManager$LongHierarchicalLedgerRangeIterator$InnerIterator.next() @bci=28, line=281 (Compiled frame)
 - org.apache.bookkeeper.meta.LongHierarchicalLedgerManager$LongHierarchicalLedgerRangeIterator$InnerIterator.next() @bci=4, line=278 (Compiled frame)
 - org.apache.bookkeeper.meta.LongHierarchicalLedgerManager$LongHierarchicalLedgerRangeIterator$InnerIterator.next() @bci=4, line=278 (Compiled frame)
 - org.apache.bookkeeper.meta.LongHierarchicalLedgerManager$LongHierarchicalLedgerRangeIterator$InnerIterator.next() @bci=4, line=278 (Compiled frame)
 - org.apache.bookkeeper.meta.LongHierarchicalLedgerManager$LongHierarchicalLedgerRangeIterator.next() @bci=8, line=304 (Compiled frame)
 - org.apache.bookkeeper.meta.HierarchicalLedgerManager$HierarchicalLedgerRangeIterator.next() @bci=26, line=117 (Compiled frame)
 - org.apache.bookkeeper.bookie.ScanAndCompareGarbageCollector.gc(org.apache.bookkeeper.bookie.GarbageCollector$GarbageCleaner) @bci=195, line=168 (Compiled frame)
 - org.apache.bookkeeper.bookie.GarbageCollectorThread.doGcLedgers() @bci=8, line=393 (Compiled frame)
 - org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(boolean, boolean, boolean) @bci=39, line=355 (Compiled frame)
 - org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun() @bci=28, line=333 (Compiled frame)
 - org.apache.bookkeeper.common.util.SafeRunnable.run() @bci=1, line=36 (Compiled frame)
 - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
 - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled frame)
 - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) @bci=1, line=180 (Compiled frame)
 - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() @bci=37, line=294 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1142 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
 - io.netty.util.concurrent.FastThreadLocalRunnable.run() @bci=4, line=30 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame)

Changes

add time-out to zk operation to avoid GC thread blocking.

@jvrao
Copy link
Contributor

jvrao commented Feb 6, 2019

@athanatos

@athanatos
Copy link

Fwiw, we saw a similar hang which turned out to be caused by apache/zookeeper#787

I think this patch may be a good idea nevertheless. I'd like to see the timeout passed as a param, however.

while (!ctx.done) {
ctx.wait();
try {
ctx.wait(TimeUnit.SECONDS.toMillis(OP_TIME_OUT_SEC));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, if no one calls ctx.done = true, the thread will just keep waiting in 2sec blocks, but it will never exit, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm.. if no one calls ctx.done = true; ctx.notifyAll(); at line#238 then ctx.wait(timeout) will throw InterruptedException and ctx.done will be true; at line#249 which takes out thread out of while loop. So, it will not keep waiting but exit. right?

try {
zkActiveLedgers = ledgerListToSet(
ZkUtils.getChildrenInSingleNode(zk, ledgerRootPath), ledgerRootPath);
ZkUtils.getChildrenInSingleNode(zk, ledgerRootPath, ZkUtils.OP_TIME_OUT_SEC),
Copy link

@athanatos athanatos Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is, it should probably be passed in from the gc thread. The gc caller should probably get its value from a config. Doing it here doesn't allow different users with different tolerances to provide different bounds. Also, passing in 0 should mean no limit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @athanatos suggested. I would like to see a setting configurable.

@reddycharan
Copy link
Contributor

@rdhabalia without fixing the real issue (taking forever to fetch ledgerznodes), how would this fix help? It can run into the same issue again, right?

@rdhabalia
Copy link
Contributor Author

@reddycharan

without fixing the real issue (taking forever to fetch ledgerznodes), how would this fix help? It can run into the same issue again, right?

the real issue is when zk does zxid rollover, it restarts the leader and zk-client doesn't receive the callback for in transit requests. however, new outgoing requests will not have the issue once zk quorum will come back. so, with this fix, blocked zk-call will be timeout and next time it will succeed.

@athanatos
Copy link

athanatos commented Feb 7, 2019

@rdhabalia No, the patch I linked above addresses a problem with the zk client side cxid (which unlike the zxid increments with reads), not the zk server side zxid. Rolling the client side cxid is actually harmless and does not require a leader election. With that patch, this call won't hang in the first place. However, that may not be the only such bug and there are other ways for zk calls to hang, so this patch may nevertheless be useful for some auxiliary zk users like the gc scan.

}

/**
* Get zk-operation timeout in seconds used by GC while performing zk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only for GC we should provide a more meaningful name.
Can you change it please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this could be generic configuration for timeout and can be used at anywhere where bookie needs blocking zk call. However, I think we can also use existing zkTimeout which can genuinely guarantees that zk-session is alive and callback couldn't complete. or let me know if you guys think we need different name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can save adding this new knob and use zkTimeout * N in your change, infact our problem is that we want to put an upperbound to the time spent waiting.

I think that our Zookeeper clients retries and creates new 'sessions' in case of session expiration. So I don't know which value is most suitable for N, we can start with 2.

How does this sounds to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eolivelli sure, that will be also fine. I have fixed it.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Thank you

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't update etcd driver and CI is failing.
Please fix

Copy link
Member

@sijie sijie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the settings to conf/bk_server.conf?

@eolivelli
Copy link
Contributor

can you add the settings to conf/bk_server.conf?
@sijie there is no new setting any more

@sijie
Copy link
Member

sijie commented Feb 8, 2019

@eolivelli sorry I didn't follow the discussion. I just checked the conversation. I am not sure if using 2 * zkTimeout is a good idea. I would prefer a separate setting called zkOpTimeout, so people can really configure op timeout. We can set the default zkOpTimeout as 2 * zkTimeout.

@eolivelli
Copy link
Contributor

@eolivelli sorry I didn't follow the discussion. I just checked the conversation. I am not sure if using 2 * zkTimeout is a good idea. I would prefer a separate setting called zkOpTimeout, so people can really configure op timeout. We can set the default zkOpTimeout as 2 * zkTimeout.

Ok @sijie
The initial idea was to have a tunable parameter, and @rdhabalia introduced a generic timeout, used only in GC. So I proposed to cal that parameter with a name with an explicit reference to GC.
Then we ended with dropping the parameter.

A new parameter works for me as well but I think it should be stated clearly that it is only for GC operations

@sijie
Copy link
Member

sijie commented Feb 8, 2019

A new parameter works for me as well but I think it should be stated clearly that it is only for GC operations

if it is only for GC operations, I am fine with current approach.

@rdhabalia
Copy link
Contributor Author

@sijie

if it is only for GC operations, I am fine with current approach.

Yes, in previous commit, I have added separate config but then we realize that for now, it only needs for GC and GC should expect response before zk-session time out happens so, zkTimeout can be used to set this timeout for now so, we moved forward with this change.

@rdhabalia
Copy link
Contributor Author

run integration tests

@rdhabalia
Copy link
Contributor Author

run integration tests

@rdhabalia
Copy link
Contributor Author

@eolivelli can we merge this PR?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Go ahead
Sorry, I forgot to 'approve'

@eolivelli
Copy link
Contributor

Merging now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants