Skip to content

KAFKA-16860; [1/2] Introduce group.version feature flag#16120

Merged
dajac merged 2 commits intoapache:trunkfrom
dajac:KAFKA-16860-1
May 31, 2024
Merged

KAFKA-16860; [1/2] Introduce group.version feature flag#16120
dajac merged 2 commits intoapache:trunkfrom
dajac:KAFKA-16860-1

Conversation

@dajac
Copy link
Copy Markdown
Member

@dajac dajac commented May 29, 2024

This patch introduces the group.version feature flag with one version:

  1. Version 1 enables the new consumer group rebalance protocol (KIP-848).

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@dajac dajac added the KIP-848 The Next Generation of the Consumer Rebalance Protocol label May 29, 2024
@dajac dajac force-pushed the KAFKA-16860-1 branch 4 times, most recently from 6a6e280 to b4be0e4 Compare May 29, 2024 11:44
Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated
Comment thread core/src/test/java/kafka/test/annotation/ClusterFeature.java
Comment thread server-common/src/main/java/org/apache/kafka/server/common/GroupVersion.java Outdated
Comment thread core/src/test/scala/unit/kafka/server/KafkaApisTest.scala
@jolshan
Copy link
Copy Markdown
Member

jolshan commented May 29, 2024

Changes LGTM so far. We will need to rebase once we get the changes and I will give my final approval then. 👍

@dajac dajac force-pushed the KAFKA-16860-1 branch 2 times, most recently from afd0ebc to 5695787 Compare May 30, 2024 08:34
@dajac
Copy link
Copy Markdown
Member Author

dajac commented May 30, 2024

@jolshan I rebased it on top of #16130 and I fixed all the tests that I missed.

@dajac
Copy link
Copy Markdown
Member Author

dajac commented May 30, 2024

Apparently, I have to fix a few others.

@dajac
Copy link
Copy Markdown
Member Author

dajac commented May 30, 2024

Ok. I have fixed all the tests, hopefully.

Comment thread server-common/src/main/java/org/apache/kafka/server/common/GroupVersion.java Outdated
@dajac dajac marked this pull request as ready for review May 30, 2024 19:33
dajac added a commit that referenced this pull request May 30, 2024
…TransactionsTest (#16139)

While working on #16120, I noticed that the transaction verification feature is disabled in `TransactionsTest` when the new group coordinator is enabled. We did this initially because the feature was not available in the new group coordinator but we fixed it a long time ago. We can enable it now.

Reviewers: Justine Olshan <jolshan@confluent.io>
@dajac
Copy link
Copy Markdown
Member Author

dajac commented May 30, 2024

The build does not seem to start… I am not sure why.

supportedGroupProtocols.add(CLASSIC);

// KafkaConfig#isNewGroupCoordinatorEnabled check both NEW_GROUP_COORDINATOR_ENABLE_CONFIG and GROUP_COORDINATOR_REBALANCE_PROTOCOLS_CONFIG
if (serverProperties.getOrDefault(NEW_GROUP_COORDINATOR_ENABLE_CONFIG, "").equals("true") ||
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we plan to remove this config value from GroupCoordinatorConfig? I see it was removed from a lot of files, but there are still a few where it is used.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly confused because we don't check for the config in kafka apis anymore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will eventually remove it for sure. This (private) config is still the one use to determine whether the new group coordinator should be enabled or not. The trick to know is that it is automatically set to true when consumer is specified as a protocol. We made this to simplify the early access and the preview.

The handling was a bit messy before this patch as the new protocol was enabled whenever the new coordinator ran. It is better now.

Copy link
Copy Markdown
Member

@jolshan jolshan May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm -- so when you say the new group coordinator -- you mean the code (which handles both protocols) and not the protocol itself.

Do I have this right?

  1. If the protocol is consumer -> new coordinator
  2. If the config is enabled -> new coordinator
  3. If protocol is not consumer && config is not enabled -> old coordinator

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let me run through the tests one more time with this understanding.

expectedFeatures.put(feature.featureName(), VersionRange.of(
0,
feature.defaultValue(MetadataVersion.LATEST_PRODUCTION)
feature.defaultValue(MetadataVersion.latestTesting())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm was this just a bug in the test...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

Copy link
Copy Markdown
Member

@jolshan jolshan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jolshan
Copy link
Copy Markdown
Member

jolshan commented May 31, 2024

Failures look like usual suspects, but can we confirm the 4.0 ones?

@dajac
Copy link
Copy Markdown
Member Author

dajac commented May 31, 2024

Will do when I get back to my computer. They indeed look suspicious.

@dajac
Copy link
Copy Markdown
Member Author

dajac commented May 31, 2024

It looks like testDescribeQuorumReplicationSuccessfu also fails for other PRs: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16154/1/tests. testTopicPartitionsArgWithInternalIncluded seems to be new.

@dajac
Copy link
Copy Markdown
Member Author

dajac commented May 31, 2024

testTopicPartitionsArgWithInternalIncluded does not fail locally. It is also a flaky test.

@dajac
Copy link
Copy Markdown
Member Author

dajac commented May 31, 2024

I confirmed that the failed tests are not related. Merging to trunk and to 3.8.

@dajac dajac merged commit ba61ff0 into apache:trunk May 31, 2024
@dajac dajac deleted the KAFKA-16860-1 branch May 31, 2024 19:48
dajac added a commit that referenced this pull request May 31, 2024
This patch updates the system tests to correctly enable the new consumer protocol/coordinator in the tests requiring them.

I went with the simplest approach for now. Long term, I think that we should refactor the tests to better handle features and non-production features.

I got a successful run of the consumer system tests with this patch combined with #16120: https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1717155071--dajac--KAFKA-16860-2--29028ae0dd/2024-05-31--001./2024-05-31--001./report.html.

Reviewers: Justine Olshan <jolshan@confluent.io>
dajac added a commit that referenced this pull request May 31, 2024
This patch introduces the `group.version` feature flag with one version:
1) Version 1 enables the new consumer group rebalance protocol (KIP-848).

Reviewers: Justine Olshan <jolshan@confluent.io>
dajac added a commit that referenced this pull request May 31, 2024
This patch updates the system tests to correctly enable the new consumer protocol/coordinator in the tests requiring them.

I went with the simplest approach for now. Long term, I think that we should refactor the tests to better handle features and non-production features.

I got a successful run of the consumer system tests with this patch combined with #16120: https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1717155071--dajac--KAFKA-16860-2--29028ae0dd/2024-05-31--001./2024-05-31--001./report.html.

Reviewers: Justine Olshan <jolshan@confluent.io>
chiacyu pushed a commit to chiacyu/kafka that referenced this pull request Jun 1, 2024
…TransactionsTest (apache#16139)

While working on apache#16120, I noticed that the transaction verification feature is disabled in `TransactionsTest` when the new group coordinator is enabled. We did this initially because the feature was not available in the new group coordinator but we fixed it a long time ago. We can enable it now.

Reviewers: Justine Olshan <jolshan@confluent.io>
IBP_3_8_IV0(20, "3.8", "IV0", true),

// Introduce version 1 of the GroupVersion feature (KIP-848).
IBP_4_0_IVO(21, "4.0", "IV0", false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dajac : It seems there is a typo. IBP_4_0_IVO should be IBP_4_0_IV0. The last zero was mis-typed as an O.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch. Let me fix this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16181 @junrao Could you please review it?

wernerdv pushed a commit to wernerdv/kafka that referenced this pull request Jun 3, 2024
…TransactionsTest (apache#16139)

While working on apache#16120, I noticed that the transaction verification feature is disabled in `TransactionsTest` when the new group coordinator is enabled. We did this initially because the feature was not available in the new group coordinator but we fixed it a long time ago. We can enable it now.

Reviewers: Justine Olshan <jolshan@confluent.io>
wernerdv pushed a commit to wernerdv/kafka that referenced this pull request Jun 3, 2024
This patch introduces the `group.version` feature flag with one version:
1) Version 1 enables the new consumer group rebalance protocol (KIP-848).

Reviewers: Justine Olshan <jolshan@confluent.io>
wernerdv pushed a commit to wernerdv/kafka that referenced this pull request Jun 3, 2024
This patch updates the system tests to correctly enable the new consumer protocol/coordinator in the tests requiring them.

I went with the simplest approach for now. Long term, I think that we should refactor the tests to better handle features and non-production features.

I got a successful run of the consumer system tests with this patch combined with apache#16120: https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1717155071--dajac--KAFKA-16860-2--29028ae0dd/2024-05-31--001./2024-05-31--001./report.html.

Reviewers: Justine Olshan <jolshan@confluent.io>
TaiJuWu pushed a commit to TaiJuWu/kafka that referenced this pull request Jun 8, 2024
…TransactionsTest (apache#16139)

While working on apache#16120, I noticed that the transaction verification feature is disabled in `TransactionsTest` when the new group coordinator is enabled. We did this initially because the feature was not available in the new group coordinator but we fixed it a long time ago. We can enable it now.

Reviewers: Justine Olshan <jolshan@confluent.io>
TaiJuWu pushed a commit to TaiJuWu/kafka that referenced this pull request Jun 8, 2024
This patch introduces the `group.version` feature flag with one version:
1) Version 1 enables the new consumer group rebalance protocol (KIP-848).

Reviewers: Justine Olshan <jolshan@confluent.io>
TaiJuWu pushed a commit to TaiJuWu/kafka that referenced this pull request Jun 8, 2024
This patch updates the system tests to correctly enable the new consumer protocol/coordinator in the tests requiring them.

I went with the simplest approach for now. Long term, I think that we should refactor the tests to better handle features and non-production features.

I got a successful run of the consumer system tests with this patch combined with apache#16120: https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1717155071--dajac--KAFKA-16860-2--29028ae0dd/2024-05-31--001./2024-05-31--001./report.html.

Reviewers: Justine Olshan <jolshan@confluent.io>
gongxuanzhang pushed a commit to gongxuanzhang/kafka that referenced this pull request Jun 12, 2024
…TransactionsTest (apache#16139)

While working on apache#16120, I noticed that the transaction verification feature is disabled in `TransactionsTest` when the new group coordinator is enabled. We did this initially because the feature was not available in the new group coordinator but we fixed it a long time ago. We can enable it now.

Reviewers: Justine Olshan <jolshan@confluent.io>
gongxuanzhang pushed a commit to gongxuanzhang/kafka that referenced this pull request Jun 12, 2024
This patch introduces the `group.version` feature flag with one version:
1) Version 1 enables the new consumer group rebalance protocol (KIP-848).

Reviewers: Justine Olshan <jolshan@confluent.io>
gongxuanzhang pushed a commit to gongxuanzhang/kafka that referenced this pull request Jun 12, 2024
This patch updates the system tests to correctly enable the new consumer protocol/coordinator in the tests requiring them.

I went with the simplest approach for now. Long term, I think that we should refactor the tests to better handle features and non-production features.

I got a successful run of the consumer system tests with this patch combined with apache#16120: https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1717155071--dajac--KAFKA-16860-2--29028ae0dd/2024-05-31--001./2024-05-31--001./report.html.

Reviewers: Justine Olshan <jolshan@confluent.io>
jolshan added a commit to jolshan/kafka that referenced this pull request Jun 27, 2024
jolshan added a commit to jolshan/kafka that referenced this pull request Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

KIP-848 The Next Generation of the Consumer Rebalance Protocol

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants