KAFKA-13062: Make DeleteConsumerGroupsHandler unmap for COORDINATOR_NOT_AVAILABLE error by showuon · Pull Request #11021 · apache/kafka

showuon · 2021-07-12T08:31:06Z

Make DeleteConsumerGroupsHandler unmap for COORDINATOR_NOT_AVAILABLE error

old handlResponse logic:

void handleResponse(AbstractResponse abstractResponse) {
      final DeleteGroupsResponse response = (DeleteGroupsResponse) abstractResponse;

      // If coordinator changed since we fetched it, retry
      if (ConsumerGroupOperationContext.hasCoordinatorMoved(response)) {
          Call call = getDeleteConsumerGroupsCall(context);
          rescheduleFindCoordinatorTask(context, () -> call, this);
          return;
      }

      final Errors groupError = response.get(context.groupId());
      if (handleGroupRequestError(groupError, context.future()))
          return;

      context.future().complete(null);
  }

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

showuon · 2021-07-12T08:31:22Z

@dajac , please take a look. Thanks.

dajac · 2021-07-13T13:57:00Z

        }
-        return new ApiResult<>(completed, failed, unmapped);
+
+        if (groupsToUnmap.isEmpty() && groupsToRetry.isEmpty()) {


It seems incorrect to do this here. We were able to do so in the other because they were expecting only one group at the time. This one is different. The driver will retry if the group is not completed nor failed. It seems to me that we could keep the existing code, no?

You are right! Updated.

dajac · 2021-07-13T13:58:14Z

+            case INVALID_GROUP_ID:
+            case NON_EMPTY_GROUP:
+            case GROUP_ID_NOT_FOUND:
+                log.error("Received non retriable failure for group {} in `{}` response", groupId,


I would also try to uniformize the logs and would use debug all the time except for the unexpected errors.

dajac · 2021-07-13T13:58:27Z

    }

-}
+}


nit: Could we revert this?

dajac

@showuon Thanks for the update. I left few minor comments and one question.

In the end, the handling of COORDINATOR_NOT_AVAILABLE is the only main difference in this PR. Should we reflect this in the title perhaps?

dajac · 2021-07-14T07:54:16Z

+            case INVALID_GROUP_ID:
+            case NON_EMPTY_GROUP:
+            case GROUP_ID_NOT_FOUND:
+                log.debug("`DeleteConsumerGroups` request for group id {} failed due to error {}", groupId, error);


nit: We should use groupId.idValue here and in the others.

dajac · 2021-07-14T07:55:10Z

            case COORDINATOR_LOAD_IN_PROGRESS:
-            case COORDINATOR_NOT_AVAILABLE:
+                // If the coordinator is in the middle of loading, then we just need to retry
+                log.debug("`DeleteConsumerGroups` request for group {} failed because the coordinator " +


nit: group -> group id?

dajac · 2021-07-14T07:55:29Z

-                unmapped.add(groupId);
+                // If the coordinator is unavailable or there was a coordinator change, then we unmap
+                // the key so that we retry the `FindCoordinator` request
+                log.debug("`DeleteConsumerGroups` request for group {} returned error {}. " +


nit: group -> group id?

dajac · 2021-07-14T08:02:35Z

-            final DeletableGroupResultCollection errorResponse1 = new DeletableGroupResultCollection();
-            errorResponse1.add(new DeletableGroupResult()
-                                   .setGroupId("groupId")
-                                   .setErrorCode(Errors.COORDINATOR_NOT_AVAILABLE.code())
-            );
-            env.kafkaClient().prepareResponse(new DeleteGroupsResponse(
-                new DeleteGroupsResponseData()
-                    .setResults(errorResponse1)));


Why are we moving this to later?

This section is testing "retriable" errors should be retried. Before the change, COORDINATOR_NOT_AVAILABLE is considered as retriable error. But after this PR, it'll considered as unmapped error, so it is moved to later, to test when receiving the error, we should re-find coordinator, and then re-send request.

dajac

LGTM

dajac · 2021-07-15T12:39:43Z

Failures are not related:

Build / JDK 16 and Scala 2.13 / shouldBeAbleToQueryFilterState – org.apache.kafka.streams.integration.QueryableStateIntegrationTest
43s
Build / JDK 11 and Scala 2.13 / remoteCloseWithoutBufferedReceives() – kafka.network.SocketServerTest
<1s
Build / JDK 11 and Scala 2.13 / shouldWorkWithUncleanShutdownWipeOutStateStore[exactly_once_v2] – org.apache.kafka.streams.integration.EOSUncleanShutdownIntegrationTest
36s
Build / JDK 8 and Scala 2.12 / shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader() – kafka.server.epoch.LeaderEpochIntegrationTest
36s
Build / JDK 8 and Scala 2.12 / shouldInnerJoinMultiPartitionQueryable – org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest

…OT_AVAILABLE error (#11021) This patch improve the error handling in `DeleteConsumerGroupsHandler` and ensure that `COORDINATOR_NOT_AVAILABLE` is unmapped in order to look up the coordinator again. Reviewers: David Jacot <djacot@confluent.io>

dajac · 2021-07-15T12:41:12Z

Merged to trunk and to 3.0. cc @kkonstantine

…OT_AVAILABLE error (apache#11021) This patch improve the error handling in `DeleteConsumerGroupsHandler` and ensure that `COORDINATOR_NOT_AVAILABLE` is unmapped in order to look up the coordinator again. Reviewers: David Jacot <djacot@confluent.io>

KAFKA-13062: refactor DeleteConsumerGroupsHandler and tests

3836257

showuon mentioned this pull request Jul 13, 2021

KAFKA-13033: COORDINATOR_NOT_AVAILABLE should be unmapped #10973

Closed

3 tasks

dajac reviewed Jul 13, 2021

View reviewed changes

KAFKA-13062: refactor code

6b5fc65

showuon force-pushed the KAFKA-13062 branch from c605483 to 6b5fc65 Compare July 14, 2021 07:27

Merge branch 'trunk' of https://github.com/apache/kafka into KAFKA-13062

587e3cc

dajac reviewed Jul 14, 2021

View reviewed changes

showuon changed the title ~~KAFKA-13062: refactor DeleteConsumerGroupsHandler and tests~~ KAFKA-13062: Make DeleteConsumerGroupsHandler unmap for COORDINATOR_NOT_AVAILABLE error Jul 14, 2021

KAFKA-13062: address comments to refactor

54a364e

dajac approved these changes Jul 15, 2021

View reviewed changes

dajac merged commit f7cf4a4 into apache:trunk Jul 15, 2021

                   }
-              }
+              }

                
                    No newline at end of file

Conversation

showuon commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

showuon commented Jul 12, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

dajac commented Jul 15, 2021

Uh oh!

dajac commented Jul 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

showuon commented Jul 12, 2021 •

edited

Loading