Skip to content

KAFKA-18185: remove internal.leave.group.on.close config#19400

Merged
bbejeck merged 87 commits intoapache:trunkfrom
frankvicky:KAFKA-16758-follow-up
Oct 3, 2025
Merged

KAFKA-18185: remove internal.leave.group.on.close config#19400
bbejeck merged 87 commits intoapache:trunkfrom
frankvicky:KAFKA-16758-follow-up

Conversation

@frankvicky
Copy link
Copy Markdown
Contributor

@frankvicky frankvicky commented Apr 7, 2025

JIRA: KAFKA-18185

This is a follow-up of #17614 The patch is to remove the
internal.leave.group.on.close config.

Reviewers: Sophie Blee-Goldman ableegoldman@gmail.com, Chia-Ping Tsai
chia7712@gmail.com, Bill Bejeck bbejeck@apache.org

@github-actions github-actions Bot added triage PRs from the community streams core Kafka Broker consumer connect clients small Small PRs labels Apr 7, 2025
@frankvicky frankvicky marked this pull request as draft April 7, 2025 09:53
@frankvicky frankvicky marked this pull request as ready for review April 8, 2025 07:45
@github-actions github-actions Bot removed the small Small PRs label Apr 8, 2025
@frankvicky frankvicky changed the title KAFKA-16758: remove internal.leave.group.on.close config KAFKA-18185: remove internal.leave.group.on.close config Apr 8, 2025
@frankvicky frankvicky marked this pull request as draft April 8, 2025 15:49
@frankvicky frankvicky marked this pull request as ready for review April 9, 2025 04:21
@frankvicky frankvicky changed the title KAFKA-18185: remove internal.leave.group.on.close config [WIP] KAFKA-18185: remove internal.leave.group.on.close config Apr 9, 2025
@github-actions github-actions Bot added the tools label Apr 9, 2025
@lucasbru
Copy link
Copy Markdown
Member

lucasbru commented Aug 27, 2025

@mjsax Not sure - the default for the consumer was to leave the group for dynamic members, so I think the example is correct? We just need to make sure that Kafka Streams will use REMAIN_IN_GROUP by default, which I suppose this PR is doing.

But I'm also confused. In KIP-1092, I see this:

         * {@code LEAVE_GROUP} means the consumer will leave the group.
         * {@code REMAIN_IN_GROUP} means the consumer will remain in the group.
         * {@code DEFAULT} applies the default behavior, which may depend on whether the consumer is static or dynamic.

And further down, in the test plan:

leaveGroup(DEFAULT)
For dynamic members: Verify that the consumer leaves the group upon close, triggering a rebalance.
For static members: Verify that the consumer remains in the group, and the group remains stable.

This seems clear. But in the new consumer code, a different behavior seems to be documented, see https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerMembershipManager.java#L431-L436

 - Default operation: both static and dynamic consumers will send a leave heartbeat
 - Leave operation: both static and dynamic consumers will send a leave heartbeat
 - Remain in group: only static consumers will send a leave heartbeat, while dynamic members will not

This seems to be completely different from the KIP. I did not check the implementation in detail though.

Copy link
Copy Markdown
Contributor Author

@frankvicky frankvicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lucasbru
This patch has been pending for a long time, so I need some time to recall.
The difference you mentioned should be an inaccurate comment, as the REMAIN_IN_GROUP function has an early return.
I will take a look

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Aug 27, 2025

Oh sorry. I mixed up this work with the corresponding KS work... https://cwiki.apache.org/confluence/display/KAFKA/KIP-1153%3A+Refactor+Kafka+Streams+CloseOptions+to+Fluent+API+Style

@frankvicky
Copy link
Copy Markdown
Contributor Author

@lucasbru: I have reviewed #17614. I suppose the difference you mentioned is due to #17614 (comment).

@lucasbru
Copy link
Copy Markdown
Member

@frankvicky: This makes sense, then. So remaining in group means sending the leave epoch in the static case. Then it is just the code comment that is misleading and could probably be updated to state what you said in the PR comment that you referenced.

Will implement these changes also for KIP-1071 then? While static membership isn't fully implemented yet for KIP-1071, we should still pull through those changes.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Aug 29, 2025

High level comment: Is it wise to complete this PR before #19955 ? -- It seems we should only remove the internal config, after we got the public API change for KS merged? Otherwise, even if not publicly user facing, we are loosing the ability to change KS's behavior on KafkaStreams#close() ? -- I know that we have some users, how actually don't like how KS behalves, and actually use this internal config to change the behavior. If we merge this PR before we complete KIP-1153, be would introduce a "regression" for these users.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Aug 29, 2025

Just talked to @bbejeck about this, and he will prioritize KIP-1153 to make sure we get is merged.

@lucasbru
Copy link
Copy Markdown
Member

lucasbru commented Sep 1, 2025

@frankvicky What I actually meant to ask in my previous comment: Are you planning to implement these changes also for KIP-1071? While static membership isn't fully implemented yet for KIP-1071, we should still pull through those changes.

@frankvicky
Copy link
Copy Markdown
Contributor Author

@lucasbru: Sorry that I missed responding to your question.
IMO, we should implement these changes also for KIP-1071, but I'm not sure we can do it at this moment.
I'm investigating the root cause of the failed test JoinWithIncompleteMetadataIntegrationTest#testShouldAutoShutdownOnJoinWithIncompleteMetadata, and it appears that the root cause is that the new protocol doesn't trigger rebalance when the topic is missing.
Would you happen to have any insight?

@lucasbru
Copy link
Copy Markdown
Member

lucasbru commented Sep 1, 2025

@frankvicky It looks like that integration test was broken by #20284 . I can see that it passes on CI only because it is first run with the old protocol, and then run with the new protocol, and in the new protocol it shuts down only because it tries to reuse the group ID from the first run. At this time, the member from the first run is still in the group, so we get an error that the group is of the incorrect protocol type.

I created https://issues.apache.org/jira/browse/KAFKA-19660 for that.

This indeed shines a light on your (this) PR though: It seems to changes the default for Kafka Streams: it leaves the group when streams.close() is called, while this wasn't the case before.

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @frankvicky for the PR! I just have one minor comment, but overall this looks good. Since this PR has been around for a while can you rebase? NM I see you did that just 4 days a ago

Apologies for taking so long for the review

public void shutdown(final boolean leaveGroup) {
log.info("Informed to shut down");
final State oldState = setState(State.PENDING_SHUTDOWN);
if (leaveGroup) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't this be simplified to leaveGroupRequested.set(leaveGroup)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!
I have amended it.

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Oct 3, 2025

@frankvicky I'm going to make a pass regarding this comment from @lucasbru - I want to confirm if we are changing the default behavior, which is something I don't think we want to do
NM - after looking at the code more, I realize this is exactly what we want by default.

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @frankvicky LGTM

@bbejeck bbejeck merged commit 68f1da8 into apache:trunk Oct 3, 2025
24 checks passed
@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Oct 3, 2025

Merged #19400 into trunk

eduwercamacaro pushed a commit to littlehorse-enterprises/kafka that referenced this pull request Nov 12, 2025
JIRA: KAFKA-18185

This is a follow-up of apache#17614   The patch is to remove the
`internal.leave.group.on.close` config.

Reviewers: Sophie Blee-Goldman <ableegoldman@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>, Bill Bejeck <bbejeck@apache.org>
shashankhs11 pushed a commit to shashankhs11/kafka that referenced this pull request Dec 15, 2025
JIRA: KAFKA-18185

This is a follow-up of apache#17614   The patch is to remove the
`internal.leave.group.on.close` config.

Reviewers: Sophie Blee-Goldman <ableegoldman@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>, Bill Bejeck <bbejeck@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants