Skip to content

KAFKA-17018: update MetadataVersion for the Kafka release 3.9#16841

Merged
cmccabe merged 1 commit intoapache:trunkfrom
cmccabe:KAFKA-17018
Aug 12, 2024
Merged

KAFKA-17018: update MetadataVersion for the Kafka release 3.9#16841
cmccabe merged 1 commit intoapache:trunkfrom
cmccabe:KAFKA-17018

Conversation

@cmccabe
Copy link
Copy Markdown
Contributor

@cmccabe cmccabe commented Aug 8, 2024

  • Mark 3.9-IV0 as stable. Metadata version 3.9-IV0 should return Fetch version 17.

  • Move ELR to 4.0-IV0. Remove 3.9-IV1 since it's no longer needed.

  • Create a new 4.0-IV1 MV for KIP-848.

- Mark 3.9-IV0 as stable. Metadata version 3.9-IV0 should return Fetch version 17.

- Move ELR to 4.0-IV0. Remove 3.9-IV1 since it's no longer needed.

- Create a new 4.0-IV1 MV for KIP-848.
Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmccabe : Thanks for the PR. Left a few comments.

* IT CANNOT BE CHANGED.</strong>
*/
public static final MetadataVersion LATEST_PRODUCTION = IBP_3_8_IV0;
public static final MetadataVersion LATEST_PRODUCTION = IBP_3_9_IV0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ListOffsetRequest has latestVersionUnstable set to true. So, it's kind of weird to have a production MV depending on a unstable PRC.

@clolov @showuon @chia7712 : Is KIP-1005 done? Could we set latestVersionUnstable to false for ListOffsetRequest in 3.9?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is KIP-1005 done? Could we set latestVersionUnstable to false for ListOffsetRequest in 3.9?

KIP-1005 is completed and backport to 3.9, so yes latestVersionUnstable should be false now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junrao : yes, KIP-1005 will ship in AK 3.9.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 do you plan to update

Copy link
Copy Markdown
Member

@chia7712 chia7712 Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan to update

I assumed @cmccabe will update it in this PR 😄

Will file a jira and PR for trunk and 3.9 later

Copy link
Copy Markdown
Member

@jolshan jolshan Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- sorry I was imprecise. If the api version is advertised by a MV that is stable it sould also be stable.

I didn't realize ListOffset was being used in the replication path. I guess this is just for tiered storage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, by default, v9 of ListOffset is not supported with latestVersionUnstable set to true. So, I still don't understand why some existing tests don't fail. We probably need to understand the testing gap. If we have sufficient test coverage, we probably don't need additional protocols.

yes, you are right. we should have a test for replica path. I just write a IT which can produce following error when LIST_OFFSET has unstable version:

[2024-08-14 03:27:13,119] ERROR Closing socket for 127.0.0.1:36885-127.0.0.1:48854-3 because of error (kafka.network.Processor:76)
org.apache.kafka.common.errors.InvalidRequestException: Received request api key LIST_OFFSETS with version 9 which is not enabled

Also, it seems we create IT with unstable APIs always [0] and I guess that is why we don't see failed tests.

Summary

  1. broker-to-broker: the LIST_OFFSET version is based on MV [1], and we use IT to protect us from updating MV with an unstable version (https://issues.apache.org/jira/browse/KAFKA-17336)
  2. new-client-to-old-broker: the builder of LIST_OFFSET will check the timestamp and its matched version (https://issues.apache.org/jira/browse/KAFKA-17331)
  3. old-client-to-new-broker: the broker will check the timestamp and its matched version (https://issues.apache.org/jira/browse/KAFKA-17331)

I didn't realize ListOffset was being used in the replication path. I guess this is just for tiered storage.

@jolshan not really, the request is used to truncate in the replication path. I have attached a sample code in https://issues.apache.org/jira/browse/KAFKA-17336. we will complete it tomorrow.

[0]

props.put(ServerConfigs.UNSTABLE_FEATURE_VERSIONS_ENABLE_CONFIG, "true")

[1]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. https://issues.apache.org/jira/browse/KAFKA-17336 should be good to not release with unstable versions that break api usage.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it seems we create IT with unstable APIs always [0] and I guess that is why we don't see failed tests.

Thanks for the investigation, @chia7712. Maybe we just need to disable unstable API/MV in a few existing integration tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just need to disable unstable API/MV in a few existing integration tests.

yep, we are looking for the similar tests

// TEST_2 released right before MV 3.9-IVO was released, and it depends on this metadata version
TEST_2(2, MetadataVersion.IBP_3_9_IV0, Collections.singletonMap(MetadataVersion.FEATURE_NAME, MetadataVersion.IBP_3_9_IV0.featureLevel()));
// TEST_2 released right before MV 4.0-IVO was released, and it depends on this metadata version
TEST_2(2, MetadataVersion.IBP_4_0_IV0, Collections.singletonMap(MetadataVersion.FEATURE_NAME, MetadataVersion.IBP_4_0_IV0.featureLevel()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jolshan : What MV should TEST_2 depend on? Do we need to keep changing it with new MVs? It would be useful to add a comment to document this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I haven't gotten a chance to update this. I was hoping to have one of the test versions be the latest unstable version, but I believe I updated the tests for them to work either way now. Let me double check.

If so we can just keep at 3.9.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar test here for real features, so if we think this is sufficient, we can remove the test I linked above. https://github.com/apache/kafka/blame/98cdf9717049e87ba34bb5161276577fcb8bd1c4/server-common/src/test/java/org/apache/kafka/server/common/FeaturesTest.java#L51

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Justine. Could we just use MetadataVersion.latestTesting() to avoid having to keep updating this with new MVs?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LatestTesting doesn't quite work since it is not guaranteed that latestTesting is unstable right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the only case when LatestTesting is stable is that there is no unstable MV, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right -- in that case this test will fail. https://github.com/apache/kafka/blame/98cdf9717049e87ba34bb5161276577fcb8bd1c4/server-common/src/test/java/org/apache/kafka/server/common/FeaturesTest.java#L137

My understanding is that we don't create a new MV when we mark one as stable, so for example, if all the MVs were marked stable, the test would fail.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I could write the test to check if the latest MV is stable, and if so, skip the check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just document that the last MV should always be unstable. I don't think it's ever been stable since "unstable" was created (we don't seem to stop developing features...)

Should we do this in a follow-on so that we can unblock 3.9?

*/
@ParameterizedTest
@ValueSource(strings = {"3.4-IV0", "3.5-IV2", "3.6-IV0", "3.7-IV2", "3.9-IV1"})
@ValueSource(strings = {"3.4-IV0", "3.5-IV2", "3.6-IV0", "3.7-IV2", "4.0-IV0"})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a followup comment in #16347 (comment). Do we need to add 4.0-IV0 for testNoLeaderEpochBumpOnEmptyTargetIsr?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not make sense to add 4.0-IV0 to testNoLeaderEpochBumpOnEmptyTargetIsr, since in 4.0 an empty target ISR actually WILL trigger a leader epoch bump.

The reason is because once KIP-966 is implemented, empty ISRs will actually be possible, whereas they are not prior to KIP-966

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmccabe : Thanks for the explanation. LGTM assuming we will have a followup PR to address #16841 (comment).

@chia7712
Copy link
Copy Markdown
Member

Thanks for the explanation. LGTM assuming we will have a followup PR to address #16841 (comment).

will file PR later (https://issues.apache.org/jira/browse/KAFKA-17319)

@cmccabe cmccabe merged commit 132e097 into apache:trunk Aug 12, 2024
@cmccabe cmccabe deleted the KAFKA-17018 branch August 12, 2024 23:30
cmccabe added a commit that referenced this pull request Aug 12, 2024
- Mark 3.9-IV0 as stable. Metadata version 3.9-IV0 should return Fetch version 17.

- Move ELR to 4.0-IV0. Remove 3.9-IV1 since it's no longer needed.

- Create a new 4.0-IV1 MV for KIP-848.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Justine Olshan <jolshan@confluent.io>
jolshan added a commit that referenced this pull request Aug 17, 2024
…#16901)

After some discussion on: #16841 (comment)

We decided it is best for test version to always map to MetadataVersion.latestTesting. We should always have one unstable MV (either because there is a feature being worked on OR when we mark the latest version stable we create a new one).

Reviewers: Jun Rao <junrao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants