MINOR Fix some test-catalog issues#18272
Conversation
|
This run https://github.com/apache/kafka/actions/runs/12383841902?pr=18241 had a quarantinedTest task timeout after 30 minutes. We can see the thread dumps get archived, but the test tasks and overall build do not fail. We only noticed this because the build scan publish step failed since there was no build scan data for one of the test tasks. @chia7712 WDYT about failing the overall build if the quarantined tests timeout? |
|
I also increase the quarantined timeout to 180m temporarily. Since the test-catalog is a 7 day look-back, we will have issues for the next week with lots of tests being run as quarantined. |
| matrix: | ||
| java: [ 23, 17 ] # If we change these, make sure to adjust ci-complete.yml | ||
| outputs: | ||
| timed-out: ${{ (steps.junit-test.outputs.gradle-exitcode == '124' || steps.junit-quarantined-test.outputs.gradle-exitcode == '124') }} |
There was a problem hiding this comment.
it seems line#221 can use this output?
There was a problem hiding this comment.
hm, maybe not. I think the job outputs aren't evaluated until the job is complete.
I'm actually not sure there were hanging tests. It remains to be seen. Since there were problems with the test catalog being essentially overwritten, a lot of tests were being run as quarantined. This caused them to hit the 30m timeout. This in turn caused the job to produce the wrong test catalog (since not every test ran), and the cycle repeats. After a run with the extended quarantinedTest timeout, we can see if there is actually a hanging test |
…e-old-protocol-versions * apache-github/trunk: KAFKA-18312: Added entityType: topicName to SubscribedTopicNames in ShareGroupHeartbeatRequest.json (apache#18285) HOTFIX: fix incompatible types: Optional<TimestampAndOffset> cannot be converted to Option<TimestampAndOffset> (apache#18284) MINOR Fix some test-catalog issues (apache#18272) KAFKA-18180: Move OffsetResultHolder to storage module (apache#18100) KAFKA-18301; Make coordinator records first class citizen (apache#18261) KAFKA-18262 Remove DefaultPartitioner and UniformStickyPartitioner (apache#18204) KAFKA-18296 Remove deprecated KafkaBasedLog constructor (apache#18257) KAFKA-12829: Remove old Processor and ProcessorSupplier interfaces (apache#18238) KAFKA-18292 Remove deprecated methods of UpdateFeaturesOptions (apache#18245) KAFKA-12829: Remove deprecated Topology#addProcessor of old Processor API (apache#18154) KAFKA-18035, KAFKA-18306, KAFKA-18092: Address TransactionsTest flaky tests (apache#18264) MINOR: change the default linger time in the new coordinator (apache#18274) KAFKA-18305: validate controller.listener.names is not in inter.broker.listener.name for kcontrollers (apache#18222) KAFKA-18207: Serde for handling transaction records (apache#18136) KAFKA-13722: Refactor Kafka Streams store interfaces (apache#18243) KAFKA-17131: Refactor TimeDefinitions (apache#18241) MINOR: Fix MessageFormatters (apache#18266) Mark flaky tests for Dec 18, 2024 (apache#18263)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Don't update the test-catalog when there was a build timeout. Also, only update the catalog from trunk (and not other base branches like 4.0).
Also, we have seen some instances of the quarantinedTest timing out due to a deadlock or other shutdown issue. We should fail the overall build in these cases to prevent these types of problems from going unseen.