Skip to content

KAFKA-8069: Setting expireTimestamp to None if it is the default value after loading v1 offset records from __consumer_offsets#3

Merged
hzxa21 merged 2 commits intolinkedin:2.0-lifrom
hzxa21:KAFKA-8069
Mar 8, 2019
Merged

Conversation

@hzxa21
Copy link
Copy Markdown

@hzxa21 hzxa21 commented Mar 8, 2019

After the 2.1 release, if the broker hasn't been upgrade to the latest inter-broker protocol version, the committed offsets stored in the __consumer_offset topic will get cleaned up way earlier than it should be when the offsets are loaded back from the __consumer_offset topic in GroupCoordinator, which will happen during leadership transition or after broker bounce. This patch fixes the bug by setting expireTimestamp to None if it is the default value after loading v1 offset records from __consumer_offsets.

Details for the bug can be found in https://issues.apache.org/jira/browse/KAFKA-8069

…e after loading v1 offset records from __consumer_offsets

After the 2.1 release, if the broker hasn't been upgrade to the latest inter-broker protocol version,
the committed offsets stored in the __consumer_offset topic will get cleaned up way earlier than it
should be when the offsets are loaded back from the __consumer_offset topic in GroupCoordinator, which
will happen during leadership transition or after broker bounce. This patch fixes the bug by setting
expireTimestamp to None if it is the default value after loading v1 offset records from __consumer_offsets
@hzxa21 hzxa21 requested a review from jonlee2 March 8, 2019 05:51
@hzxa21 hzxa21 changed the title KAFKA-8069: Setting expireTimestamp to None if it is the default valu… KAFKA-8069: Setting expireTimestamp to None if it is the default value after loading v1 offset records from __consumer_offsets Mar 8, 2019
@hzxa21 hzxa21 requested review from jjkoshy, onurkaraman and xiowu0 March 8, 2019 05:52
Copy link
Copy Markdown

@xiowu0 xiowu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through upstream ticket. fix LGTM.

Comment thread core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala
Copy link
Copy Markdown
Collaborator

@jonlee2 jonlee2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@hzxa21 hzxa21 merged commit 8d1fddd into linkedin:2.0-li Mar 8, 2019
kidkun pushed a commit to kidkun/kafka that referenced this pull request Dec 14, 2019
…pache#7305)

A partition log in initialized in following steps:

1. Fetch log config from ZK
2. Call LogManager.getOrCreateLog which creates the Log object, then
3. Registers the Log object

Step linkedin#3 enables Configuration update thread to deliver configuration
updates to the log. But if any update arrives between step linkedin#1 and linkedin#3
then that update is missed. It breaks following use case:

1. Create a topic with default configuration, and immediately after that
2. Update the configuration of topic

There is a race condition here and in random cases update made in
second step will get dropped.

This change fixes it by tracking updates arriving between step linkedin#1 and linkedin#3
Once a Partition is done initializing log, it checks if it has missed any
update. If yes, then the configuration is read from ZK again.

Added unit tests to make sure a dirty configuration is refreshed. Tested
on local cluster to make sure that topic configuration and updates are
handled correctly.

Reviewers: Jason Gustafson <jason@confluent.io>
gitlw pushed a commit to gitlw/kafka that referenced this pull request May 4, 2020
…pache#7305)

A partition log in initialized in following steps:

1. Fetch log config from ZK
2. Call LogManager.getOrCreateLog which creates the Log object, then
3. Registers the Log object

Step linkedin#3 enables Configuration update thread to deliver configuration
updates to the log. But if any update arrives between step linkedin#1 and linkedin#3
then that update is missed. It breaks following use case:

1. Create a topic with default configuration, and immediately after that
2. Update the configuration of topic

There is a race condition here and in random cases update made in
second step will get dropped.

This change fixes it by tracking updates arriving between step linkedin#1 and linkedin#3
Once a Partition is done initializing log, it checks if it has missed any
update. If yes, then the configuration is read from ZK again.

Added unit tests to make sure a dirty configuration is refreshed. Tested
on local cluster to make sure that topic configuration and updates are
handled correctly.

Reviewers: Jason Gustafson <jason@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants