Skip to content

MINOR Fix some test-catalog issues#18272

Merged
mumrah merged 4 commits intoapache:trunkfrom
mumrah:minor-fail-on-quarantined-timeout
Dec 20, 2024
Merged

MINOR Fix some test-catalog issues#18272
mumrah merged 4 commits intoapache:trunkfrom
mumrah:minor-fail-on-quarantined-timeout

Conversation

@mumrah
Copy link
Copy Markdown
Member

@mumrah mumrah commented Dec 19, 2024

Don't update the test-catalog when there was a build timeout. Also, only update the catalog from trunk (and not other base branches like 4.0).

Also, we have seen some instances of the quarantinedTest timing out due to a deadlock or other shutdown issue. We should fail the overall build in these cases to prevent these types of problems from going unseen.

@github-actions github-actions Bot added build Gradle build or GitHub Actions small Small PRs labels Dec 19, 2024
@mumrah
Copy link
Copy Markdown
Member Author

mumrah commented Dec 19, 2024

This run https://github.com/apache/kafka/actions/runs/12383841902?pr=18241 had a quarantinedTest task timeout after 30 minutes. We can see the thread dumps get archived, but the test tasks and overall build do not fail.

We only noticed this because the build scan publish step failed since there was no build scan data for one of the test tasks.

@chia7712 WDYT about failing the overall build if the quarantined tests timeout?

@mumrah mumrah requested a review from chia7712 December 19, 2024 16:09
@mumrah mumrah changed the title MINOR Fail build on quarntinedTest timeout MINOR Fix some test-catalog issues Dec 19, 2024
@mumrah
Copy link
Copy Markdown
Member Author

mumrah commented Dec 20, 2024

I also increase the quarantined timeout to 180m temporarily. Since the test-catalog is a 7 day look-back, we will have issues for the next week with lots of tests being run as quarantined.

Copy link
Copy Markdown
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mumrah LGTM - btw, have you created the jira for those hanging tests?

matrix:
java: [ 23, 17 ] # If we change these, make sure to adjust ci-complete.yml
outputs:
timed-out: ${{ (steps.junit-test.outputs.gradle-exitcode == '124' || steps.junit-quarantined-test.outputs.gradle-exitcode == '124') }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems line#221 can use this output?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, maybe not. I think the job outputs aren't evaluated until the job is complete.

@mumrah
Copy link
Copy Markdown
Member Author

mumrah commented Dec 20, 2024

hanging tests

I'm actually not sure there were hanging tests. It remains to be seen. Since there were problems with the test catalog being essentially overwritten, a lot of tests were being run as quarantined. This caused them to hit the 30m timeout. This in turn caused the job to produce the wrong test catalog (since not every test ran), and the cycle repeats.

After a run with the extended quarantinedTest timeout, we can see if there is actually a hanging test

@mumrah mumrah merged commit af5d6c2 into apache:trunk Dec 20, 2024
ijuma added a commit to ijuma/kafka that referenced this pull request Dec 20, 2024
…e-old-protocol-versions

* apache-github/trunk:
  KAFKA-18312: Added entityType: topicName to SubscribedTopicNames in ShareGroupHeartbeatRequest.json (apache#18285)
  HOTFIX: fix incompatible types: Optional<TimestampAndOffset> cannot be converted to Option<TimestampAndOffset> (apache#18284)
  MINOR Fix some test-catalog issues (apache#18272)
  KAFKA-18180: Move OffsetResultHolder to storage module (apache#18100)
  KAFKA-18301; Make coordinator records first class citizen (apache#18261)
  KAFKA-18262 Remove DefaultPartitioner and UniformStickyPartitioner (apache#18204)
  KAFKA-18296 Remove deprecated KafkaBasedLog constructor (apache#18257)
  KAFKA-12829: Remove old Processor and ProcessorSupplier interfaces (apache#18238)
  KAFKA-18292 Remove deprecated methods of UpdateFeaturesOptions (apache#18245)
  KAFKA-12829: Remove deprecated Topology#addProcessor of old Processor API (apache#18154)
  KAFKA-18035, KAFKA-18306, KAFKA-18092: Address TransactionsTest flaky tests (apache#18264)
  MINOR: change the default linger time in the new coordinator (apache#18274)
  KAFKA-18305: validate controller.listener.names is not in inter.broker.listener.name for kcontrollers (apache#18222)
  KAFKA-18207: Serde for handling transaction records (apache#18136)
  KAFKA-13722: Refactor Kafka Streams store interfaces (apache#18243)
  KAFKA-17131: Refactor TimeDefinitions (apache#18241)
  MINOR: Fix MessageFormatters (apache#18266)
  Mark flaky tests for Dec 18, 2024 (apache#18263)
tedyu pushed a commit to tedyu/kafka that referenced this pull request Jan 6, 2025
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Gradle build or GitHub Actions small Small PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants