KAFKA-17705: Add Transactions V2 system tests and mark as production ready#18132
KAFKA-17705: Add Transactions V2 system tests and mark as production ready#18132jolshan merged 7 commits intoapache:trunkfrom
Conversation
|
@jolshan #17881 adds a "triage" label to PRs from non-committers. Turns out this also affect committers if their membership visibility in the ASF GitHub org is not public. I added instructions for setting your membership visibility to public https://github.com/apache/kafka/blob/trunk/.github/workflows/README.md#pr-triage |
| MetadataVersion.latestProduction().featureLevel())); | ||
| for (Feature feature : Feature.PRODUCTION_FEATURES) { | ||
| short maxVersion = enableUnstable ? feature.latestTesting() : feature.latestProduction(); | ||
| short maxVersion = enableUnstable ? feature.latestTesting() : feature.defaultLevel(MetadataVersion.LATEST_PRODUCTION); |
There was a problem hiding this comment.
@junrao @dongnuo123 I noticed we didn't change the defaults here on the previous PR. I have done so here. A test was failing since the production version for transaction version is now not the same as the default based on the latest production MV.
There was a problem hiding this comment.
@jolshan : Not sure that I understand this change. The result of defaultFeatureMap is used for Controller/Broker registration. So, it seems that we should pass in the max supported version of each feature, instead of the default version, right? In fact, defaultFeatureMap should be renamed to sth like supportedFeatureMap.
A test was failing since the production version for transaction version is now not the same as the default based on the latest production MV.
Hmm, I thought that with #17886, it's ok for the latest production version for TV to be different from the default. It just needs to be larger.
There was a problem hiding this comment.
Yes. My understanding from #17886 was that we want a separate production vs default value.
I thought these methods were also meant to create the default features, not the max supported ones. It's my bad if I misunderstood that. I will take another look and if that is the case, fix the test.
There was a problem hiding this comment.
Updated with the changes to the name and the test
| latestFinalizedFeaturesEpoch = info.finalizedFeaturesEpoch; | ||
| Short transactionVersion = info.finalizedFeatures.get("transaction.version"); | ||
| isTransactionV2Enabled = transactionVersion != null && transactionVersion >= 2; | ||
| log.debug("Updating isTV2 enabled to {} at with FinalizedFeaturesEpoch {}", isTransactionV2Enabled, latestFinalizedFeaturesEpoch); |
There was a problem hiding this comment.
at with FinalizedFeaturesEpoch => with FinalizedFeaturesEpoch
|
I am seeing some test failures in the system tests, so I will investigate. some issues with the old tests: https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-12-11--001.600822cf-da88-4744-b834-795c565d98d5--1733957556--jolshan--kafka-17705--e69893c32c/report.html (some could be already failing on trunk) |
|
I also pushed a change and started a new test specific run. |
|
Thanks Jun. Taking a look. |
|
Here's the new run of just the changed tests: https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-12-12--001.65845de4-1d44-4e7d-8d23-ac15c9440cb2--1733986841--jolshan--kafka-17705--92dfdf6028/report.html Still is a bit flaky even with the timeout increased. Will look at that. I also need to see if the consumer failures are unique to this PR or something that was in trunk at the time I branched. |
|
The unit test failure does not seem related. |
|
This explains the main divergence from trunk failures #18036. I do see some issues with fencing in the changes to the tests, so I will continue to investigate |
|
System tests uncovered a bug! Will fix that and come back here :) https://issues.apache.org/jira/browse/KAFKA-18227 |
|
I've merged #18176, so I will update this now and rerun tests. |
|
@jolshan : Thanks for rerunning the tests. Is the build scan failure related to this PR? |
|
@junrao nope -- I confirmed with David Arthur that the build scan issue for quarantined tests is unrelated. Here are the latest test results (without the log4j change that seems to be causing issues for our test running infra) |
…ready (#18132) Added transaction version 2 to some of the system tests. Also marking TV2 as production ready. Also fixes the defaultVersion test. Reviewers: Jun Rao <jun@confluent.io>
…ready (apache#18132) Added transaction version 2 to some of the system tests. Also marking TV2 as production ready. Also fixes the defaultVersion test. Reviewers: Jun Rao <jun@confluent.io>
| cmd += " --standalone" | ||
| self.standalone_controller_bootstrapped = True | ||
| if self.use_transactions_v2: | ||
| cmd += " --feature transaction.version=2" |
There was a problem hiding this comment.
We missed setting transaction.version for isolated kraft, resulting in the cluster using v0. Since this nullifies the suit use_transactions_v2=True, we will submit a patch later.
There was a problem hiding this comment.
Hmm, is there a separate place to see this? That is unfortunate.
There was a problem hiding this comment.
I observed this issue while running tests/kafkatest/tests/core/transactions_test.py
The settings "isolated_kraft" and "use_transactions_v2=true" failed to enable tv2 on the cluster
There was a problem hiding this comment.
Do we know how this maps to the kafka.py file? Ie, why this line of code doesn't apply for isolated kraft?
There was a problem hiding this comment.
Do we know how this maps to the kafka.py file? Ie, why this line of code doesn't apply for isolated kraft?
see #21164
Added transaction version 2 to some of the system tests. Also marking TV2 as production ready.
Will share the results of the tests when I get them.