KAFKA-14208: Should not wake-up with non-blocking coordinator discovery by guozhangwang · Pull Request #12611 · apache/kafka

guozhangwang · 2022-09-09T00:16:01Z

Today we may try to discover coordinator in both blocking (e.g. in poll) and non-blocking (e.g. in commitAsync) way. For the latter we would poll the underlying network client with timeout 0, and in this case we should not trigger wakeup since these are non-blocking calls and hence should not throw wake-ups.

In this PR I'm trying to fix it in a least intrusive way (a more general fix should be, potentially, to have two versions of ensureCoordinatorReady), since in our threading refactoring, the ensureCoordinatorReady function would not be called by the calling thread any more and only triggered by the background thread, and hence we would have a much simpler manner to ensure that non-blocking functions never throw wake-ups.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

guozhangwang · 2022-09-09T00:16:20Z

Call @philipnee @showuon for reviews.

hachikuji · 2022-09-09T23:37:47Z

+
+            // if we do not want to block on discovering coordinator at all,
+            // then we should not try to poll in a loop, and should not throw wake-up exception either
+            if (timer.timeoutMs() == 0L) {


I wonder if this breaks the API in a different way. If the user calls wakeup(), would we expect the next blocking call to trigger a wakeup even if it were called with a timeout of zero? I wonder if it would be better to overload ensureCoordinatorReady with an additional flag?

would we expect the next blocking call to trigger a wakeup even if it were called with a timeout of zero?

Yes, I think as long as it's a blocking call, the wakeup exception should be thrown, even if it's zero timeout. I've checked existing javadoc in KafkaConsumer, all the methods, which declared wakeupException will be thrown, will still throw wakeupExceptions after this change. So, I think this PR change (1) won't break API, and (2) fixes the issue for commitAsync, which makes sense to me.

I wonder if it would be better to overload ensureCoordinatorReady with an additional flag?

Yes, as @guozhangwang mentioned in PR description:

In this PR I'm trying to fix it in a least intrusive way (a more general fix should be, potentially, to have two versions of ensureCoordinatorReady)

I think this fix is safer. But I don't have strong opinion about it, just want to raise the release timing issue here.

Yeah, I feel it's a bit slippery to leave the logic in place with just a TODO somewhere to go back and fix it. Too many times, the follow-up never happens. Is it really that much additional effort to add an overload?

protected synchronized boolean ensureCoordinatorReady(final Timer timer) { return ensureCoordinatorReady(timer, true); } private synchronized boolean ensureCoordinatorReady(final Timer timer, boolean checkWakeup) { ...

showuon

The change makes sense to me. Thanks.

showuon · 2022-09-12T01:41:27Z

+
+            // if we do not want to block on discovering coordinator at all,
+            // then we should not try to poll in a loop, and should not throw wake-up exception either
+            if (timer.timeoutMs() == 0L) {


would we expect the next blocking call to trigger a wakeup even if it were called with a timeout of zero?

Yes, I think as long as it's a blocking call, the wakeup exception should be thrown, even if it's zero timeout. I've checked existing javadoc in KafkaConsumer, all the methods, which declared wakeupException will be thrown, will still throw wakeupExceptions after this change. So, I think this PR change (1) won't break API, and (2) fixes the issue for commitAsync, which makes sense to me.

I wonder if it would be better to overload ensureCoordinatorReady with an additional flag?

Yes, as @guozhangwang mentioned in PR description:

In this PR I'm trying to fix it in a least intrusive way (a more general fix should be, potentially, to have two versions of ensureCoordinatorReady)

I think this fix is safer. But I don't have strong opinion about it, just want to raise the release timing issue here.

showuon · 2022-09-12T01:47:58Z

+        try {
+            coordinator.joinGroupIfNeeded(mockTime.timer(0));
+            fail("Should have woken up from joinGroupIfNeeded()");
+        } catch (WakeupException ignored) {


nit:

assertThrows(WakeupException.class, () -> coordinator.joinGroupIfNeeded(mockTime.timer(0)), "Should have woken up from joinGroupIfNeeded()")

showuon · 2022-09-12T01:48:36Z

+        try {
+            coordinator.ensureCoordinatorReady(mockTime.timer(1));
+            fail("Should have woken up from ensureCoordinatorReady()");
+        } catch (WakeupException ignored) {
+        }


nit: can be replaced with assertThrows

hachikuji · 2022-09-12T20:11:41Z

@showuon Since Guozhang is out sick, I raised #12626 which addresses your (and my) comments. Please take a look. I'll leave this open for now.

hachikuji · 2022-09-13T04:21:20Z

I'll go ahead and close this since we're going to merge #12626.

…et commits (#12626) Asynchronous offset commits may throw an unexpected WakeupException following #11631 and #12244. This patch fixes the problem by passing through a flag to ensureCoordinatorReady to indicate whether wakeups should be disabled. This is used to disable wakeups in the context of asynchronous offset commits. All other uses leave wakeups enabled. Note: this patch builds on top of #12611. Co-Authored-By: Guozhang Wang wangguoz@gmail.com Reviewers: Luke Chen <showuon@gmail.com>

should not wake-up with non-blocking coordinator discovery

2d0848b

jsancio added 3.3 consumer labels Sep 9, 2022

hachikuji reviewed Sep 9, 2022

View reviewed changes

showuon approved these changes Sep 12, 2022

View reviewed changes

hachikuji mentioned this pull request Sep 12, 2022

KAFKA-14208; Do not raise wakeup in consumer during asynchronous offset commits #12626

Merged

3 tasks

hachikuji closed this Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-14208: Should not wake-up with non-blocking coordinator discovery#12611

KAFKA-14208: Should not wake-up with non-blocking coordinator discovery#12611
guozhangwang wants to merge 1 commit intoapache:trunkfrom
guozhangwang:K14208-do-not-wakeup-if-timed-out

guozhangwang commented Sep 9, 2022

Uh oh!

guozhangwang commented Sep 9, 2022

Uh oh!

hachikuji Sep 9, 2022

Uh oh!

showuon Sep 12, 2022

Uh oh!

hachikuji Sep 12, 2022 •

edited

Loading

Uh oh!

showuon left a comment

Uh oh!

showuon Sep 12, 2022

Uh oh!

showuon Sep 12, 2022

Uh oh!

showuon Sep 12, 2022

Uh oh!

hachikuji commented Sep 12, 2022

Uh oh!

hachikuji commented Sep 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

guozhangwang commented Sep 9, 2022

Committer Checklist (excluded from commit message)

Uh oh!

guozhangwang commented Sep 9, 2022

Uh oh!

hachikuji Sep 9, 2022

Choose a reason for hiding this comment

Uh oh!

showuon Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

hachikuji Sep 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

showuon left a comment

Choose a reason for hiding this comment

Uh oh!

showuon Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

showuon Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

showuon Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

hachikuji commented Sep 12, 2022

Uh oh!

hachikuji commented Sep 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hachikuji Sep 12, 2022 •

edited

Loading