KAFKA-8164: Improve test passing rate by rerunning flaky tests#8019
KAFKA-8164: Improve test passing rate by rerunning flaky tests#8019ijuma merged 8 commits intoapache:trunkfrom
Conversation
|
@ijuma I've created the new PR about flaky tests. Currently it lets 3 retries for a test and fails the build after 50 test failure. Both numbers are a bit arbitrary (especially the 50 :) ), but if you know a good constant for them, I'd be happy to change them. Furthermore I ran into a few cases where the Not sure if this is a big problem but I wanted to let you know. The old custom flaky retry didn't handle these either. |
|
|
||
| retry { | ||
| maxRetries = 3 | ||
| maxFailures = 50 |
There was a problem hiding this comment.
I would go with something conservative: 1 retry and max 5 failures.We want to fix flaky tests, but if something just fails once, it's not worth failing the build given Jenkins flakiness.
There was a problem hiding this comment.
We can adjust after we see how it works in practice, but this would be a good starting point IMO.
There was a problem hiding this comment.
Ok. I figured we can also make these configurable. Updated the PR.
| userMaxForks = project.hasProperty('maxParallelForks') ? maxParallelForks.toInteger() : null | ||
|
|
||
| userFlakyRetries = project.hasProperty('flakyRetries') ? flakyRetries.toInteger() : 1 | ||
| userMaxTestFailures = project.hasProperty('maxTestFailures') ? maxTestFailures.toInteger() : 5 |
There was a problem hiding this comment.
How about maxTestRetries and maxTestRetryFailures? Also, we need to update the Readme to mention these.
|
Updated the common configs section |
|
ok to test |
| spotbugs: "3.1.12", | ||
| spotbugsPlugin: "3.0.0", | ||
| spotlessPlugin: "3.27.1", | ||
| testRetryPlugin: "1.0.2", |
There was a problem hiding this comment.
How about using latest version (1.1.0)?
There was a problem hiding this comment.
I also noticed it was released shortly after this PR was submitted. Updated.
|
ok to test |
|
@viktorsomogyi I made some minor changes, please check that you're happy with them. |
|
ok to test |
|
retest this please |
|
Thinking about this some more and after some discussion, I think we should probably default to 0 retries and tweak the PR builder to pass 1 and 5. Will update the PR if @viktorsomogyi doesn't beat me to it. |
|
@ijuma I was behind you 10 minutes or so, waited for the local build to run and I saw it only after that you already pushed. :) |
|
retest this please |
|
Updated Jenkins PR jobs to be:
Branch builds have not been changed. |
|
One flaky failure with retries (2.12 job):
Three flaky failures without retries (2.13 job):
I'll go ahead and merge this so that we can evaluate how well it works in practice. |
|
Thanks for the contribution! |
|
@ijuma thank you for reviewing this change! |
Conflicts: * build.gradle: moved avro plugin definition below newly added test retry plugin. * apache-github/trunk: MINOR: further InternalTopologyBuilder cleanup (apache#8046) MINOR: Add timer for update limit offsets (apache#8047) HOTFIX: Fix spotsbug failure in Kafka examples (apache#8051) KAFKA-9447: Add new customized EOS model example (apache#8031) KAFKA-8164: Add support for retrying failed (apache#8019) HOTFIX: checkstyle for newly added unit test KAFKA-9261; Client should handle unavailable leader metadata (apache#7770) MINOR: Fix typos introduced in KIP-559 (apache#8042) MINOR: Fixing null handilg in ValueAndTimestampSerializer (apache#7679) KAFKA-9113: Clean up task management and state management (apache#7997) MINOR: fix checkstyle issue in ConsumerConfig.java (apache#8038) KAFKA-9491; Increment high watermark after full log truncation (apache#8037) KAFKA-9477 Document RoundRobinAssignor as an option for partition.assignment.strategy (apache#8007) KAFKA-9074: Correct Connect’s `Values.parseString` to properly parse a time and timestamp literal (apache#7568) KAFKA-9492; Ignore record errors in ProduceResponse for older versions (apache#8030)
…t-for-generated-requests * apache-github/trunk: (410 commits) KAFKA-8843: KIP-515: Zookeeper TLS support MINOR: Add missing quote for malformed line content (apache#8070) MINOR: Simplify KafkaProducerTest (apache#8044) KAFKA-9507; AdminClient should check for missing committed offsets (apache#8057) KAFKA-9519: Deprecate the --zookeeper flag in ConfigCommand (apache#8056) KAFKA-9509; Fixing flakiness of MirrorConnectorsIntegrationTest.testReplication (apache#8048) HOTFIX: Fix two test failures in JDK11 (apache#8063) DOCS - clarify transactionalID and idempotent behavior (apache#7821) MINOR: further InternalTopologyBuilder cleanup (apache#8046) MINOR: Add timer for update limit offsets (apache#8047) HOTFIX: Fix spotsbug failure in Kafka examples (apache#8051) KAFKA-9447: Add new customized EOS model example (apache#8031) KAFKA-8164: Add support for retrying failed (apache#8019) HOTFIX: checkstyle for newly added unit test KAFKA-9261; Client should handle unavailable leader metadata (apache#7770) MINOR: Fix typos introduced in KIP-559 (apache#8042) MINOR: Fixing null handilg in ValueAndTimestampSerializer (apache#7679) KAFKA-9113: Clean up task management and state management (apache#7997) MINOR: fix checkstyle issue in ConsumerConfig.java (apache#8038) KAFKA-9491; Increment high watermark after full log truncation (apache#8037) ...
| ./gradlew clients:test --tests RequestResponseTest | ||
|
|
||
| ### Specifying test retries ### | ||
| By default, each failed test is retried once up to a maximum of five retries per test run. Tests are retried at the end of the test task. Adjust these parameters in the following way: |
There was a problem hiding this comment.
Maybe we should update this line that by default it is not retrying.
Committer Checklist (excluded from commit message)