HOTFIX: only try to clear discover-coordinator future upon commit by guozhangwang · Pull Request #12244 · apache/kafka

guozhangwang · 2022-06-02T18:22:26Z

This is another way of fixing KAFKA-13563 other than #11631.

Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator:

commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix.
commitSync, which we already try to re-discovery coordinator.
committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator.

The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in #11631 would let the consumer to discover coordinator even if none of the above operations are required.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

guozhangwang · 2022-06-02T18:22:44Z

@showuon @ijuma

dajac

@guozhangwang Thanks for the PR. Overall, the approach looks good to me. I left a few comments, mainly regarding the test cases.

dajac · 2022-06-03T14:23:57Z

            // awaitMetadataUpdate() in ensureCoordinatorReady initiates new connections with configured backoff and avoids the busy loop.
-            if (coordinatorUnknownAndUnready(timer)) {
-                return false;
+            if (metadata.updateRequested() && !client.hasReadyNodes(timer.currentTimeMs())) {


nit: I think that we need to remove if coordinator is unknown, make sure we lookup one and from the above comment (first sentence).

dajac · 2022-06-03T14:45:59Z

+    public void testCommitAsyncWithUserAssignedType() {
+        subscriptions.assignFromUser(Collections.singleton(t1p));
+        // should mark coordinator unknown after COORDINATOR_NOT_AVAILABLE error
+        client.prepareResponse(groupCoordinatorResponse(node, Errors.COORDINATOR_NOT_AVAILABLE));


Do we really need this? I thought that the whole point was to ensure that no requests are sent out when the manual mode is used.

dajac · 2022-06-03T14:46:21Z

+        client.prepareResponse(groupCoordinatorResponse(node, Errors.COORDINATOR_NOT_AVAILABLE));
+        // set timeout to 0 because we don't want to retry after the error
+        coordinator.poll(time.timer(0));
+        assertTrue(coordinator.coordinatorUnknown());


Should we also assert that there is not inflight requests after calling poll?

dajac · 2022-06-03T14:47:59Z

+    @Test
+    public void testAutoCommitAsyncWithUserAssignedType() {
+        try (ConsumerCoordinator coordinator = buildCoordinator(rebalanceConfig, new Metrics(), assignors, true, subscriptions)
+        ) {


nit: I would bring back the closing parenthesis and the opening curly brace on the previous line.

dajac · 2022-06-03T14:48:11Z

+        ) {
+            subscriptions.assignFromUser(Collections.singleton(t1p));
+            // should mark coordinator unknown after COORDINATOR_NOT_AVAILABLE error
+            client.prepareResponse(groupCoordinatorResponse(node, Errors.COORDINATOR_NOT_AVAILABLE));


Same question as before.

…try-to-discovery-upon-commit

guozhangwang

@dajac I addressed you comments and modifies the tests, also along with that I found a minor issue which is not fixed in my first commit (thanks to your suggestion on the unit tests!).

guozhangwang · 2022-06-03T21:27:56Z

+                client.awaitMetadataUpdate(timer);
            }
+
+            // if there is pending coordinator requests, ensure they have a chance to be transmitted.


This is a major change while addressing @dajac 's comment: previously the manual assignment, the coordinator.poll call would not call networkClient.poll, which means that if the coordinator discovery does not complete within the commitAsync (note we call networkClient.poll twice in that function, so it's possible that function would complete the discovery), we would have no other places to poll the network client to complete the pending requests.

showuon

LGTM! Thanks for fixing it in a better way. Yes, I agree with this change, we will only find coordinator when necessary, which is a good improvement! Left a minor comment. Thank you.

showuon · 2022-06-04T06:58:43Z

+
+        // should try to find coordinator since we are commit async
+        coordinator.commitOffsetsAsync(singletonMap(t1p, new OffsetAndMetadata(100L)), (offsets, exception) -> {
+            throw new AssertionError("Commit should not get responses");


nit: use fail instead, and we might need to log the callback parameters for troubleshooting.

fail("Commit should not get responses, but got offsets:" + offsets +", and exception:" + exception)

dajac

LGTM, thanks.

…try-to-discovery-upon-commit

guozhangwang · 2022-06-06T18:07:37Z

Cherry-picking to 3.2

…2244) This is another way of fixing KAFKA-13563 other than #11631. Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator: * commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix. * commitSync, which we already try to re-discovery coordinator. * committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator. The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in #11631 would let the consumer to discover coordinator even if none of the above operations are required. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>

…2244) (#12259) This is a cherrypick commit of 3.1. Another way of fixing KAFKA-13563 other than #11631. Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator: commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix. commitSync, which we already try to re-discovery coordinator. committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator. The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in #11631 would let the consumer to discover coordinator even if none of the above operations are required. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>

…ache#12244) (apache#12259) This is a cherrypick commit of 3.1. Another way of fixing KAFKA-13563 other than apache#11631. Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator: commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix. commitSync, which we already try to re-discovery coordinator. committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator. The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in apache#11631 would let the consumer to discover coordinator even if none of the above operations are required. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>

…ache#12244) This is another way of fixing KAFKA-13563 other than apache#11631. Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator: * commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix. * commitSync, which we already try to re-discovery coordinator. * committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator. The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in apache#11631 would let the consumer to discover coordinator even if none of the above operations are required. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>

* HOTFIX: only try to clear discover-coordinator future upon commit (apache#12244) This is another way of fixing KAFKA-13563 other than apache#11631. Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator: * commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix. * commitSync, which we already try to re-discovery coordinator. * committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator. The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in apache#11631 would let the consumer to discover coordinator even if none of the above operations are required. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io> * HOTFIX: add space to avoid checkstyle failure Co-authored-by: Guozhang Wang <wangguoz@gmail.com>

…ache#12244) (apache#12259) (#723) This is a cherrypick commit of 3.1. Another way of fixing KAFKA-13563 other than apache#11631. Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator: commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix. commitSync, which we already try to re-discovery coordinator. committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator. The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in apache#11631 would let the consumer to discover coordinator even if none of the above operations are required. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io> Co-authored-by: Guozhang Wang <wangguoz@gmail.com>

…et commits (#12626) Asynchronous offset commits may throw an unexpected WakeupException following #11631 and #12244. This patch fixes the problem by passing through a flag to ensureCoordinatorReady to indicate whether wakeups should be disabled. This is used to disable wakeups in the context of asynchronous offset commits. All other uses leave wakeups enabled. Note: this patch builds on top of #12611. Co-Authored-By: Guozhang Wang wangguoz@gmail.com Reviewers: Luke Chen <showuon@gmail.com>

quick change

871573a

guozhangwang added 2 commits June 2, 2022 13:34

unit tests

df48a52

add unit tests

3c707bf

guozhangwang changed the title ~~[DO NOT MERGE] HOTFIX: only try to clear discover-coordinator future upon commit~~ HOTFIX: only try to clear discover-coordinator future upon commit Jun 2, 2022

dajac reviewed Jun 3, 2022

View reviewed changes

guozhangwang added 2 commits June 3, 2022 14:03

Merge branch 'trunk' of https://github.com/apache/kafka into KHOTFIX-…

ee6d74c

…try-to-discovery-upon-commit

github comments

0642a2c

guozhangwang commented Jun 3, 2022

View reviewed changes

showuon approved these changes Jun 4, 2022

View reviewed changes

dajac approved these changes Jun 6, 2022

View reviewed changes

guozhangwang added 2 commits June 6, 2022 11:02

Merge branch 'trunk' of https://github.com/apache/kafka into KHOTFIX-…

7c0d049

…try-to-discovery-upon-commit

github comments

52c1427

guozhangwang merged commit 2047fc3 into apache:trunk Jun 6, 2022

hachikuji mentioned this pull request Sep 12, 2022

KAFKA-14208; Do not raise wakeup in consumer during asynchronous offset commits #12626

Merged

3 tasks

Conversation

guozhangwang commented Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

guozhangwang commented Jun 2, 2022

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

showuon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

guozhangwang commented Jun 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

guozhangwang commented Jun 2, 2022 •

edited

Loading