Skip to content

Fix flakiness in compaction ITs#17877

Merged
cryptoe merged 1 commit intoapache:masterfrom
kfaraz:fix_taskslots_api
Apr 4, 2025
Merged

Fix flakiness in compaction ITs#17877
cryptoe merged 1 commit intoapache:masterfrom
kfaraz:fix_taskslots_api

Conversation

@kfaraz
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz commented Apr 4, 2025

Description

#17834 has made some ITs flaky due to a race condition in the /taskslots Coordinator API.
Since the Overlord now exposes APIs to update compaction configs, the Coordinator may have
a stale version of the config while trying to update the task slots. This config is refreshed from
the metadata store every minute. But if the config has just been updated on the Overlord and
the taskslots API is called, it may end up reverting the change performed by the Overlord API.

This issue does not exist with the other compaction config APIs.

Changes

  • Use latest config in the /taskslots API

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@cryptoe cryptoe merged commit 11c2ad1 into apache:master Apr 4, 2025
75 checks passed
@cryptoe
Copy link
Copy Markdown
Contributor

cryptoe commented Apr 4, 2025

While the Patch fixes the flakiness there is still a race in task slots update logic rite ? As in if we set the compaction config task slots via the overlord API, the coordinator API can override it no ?

@cryptoe cryptoe added this to the 33.0.0 milestone Apr 4, 2025
cryptoe pushed a commit to cryptoe/druid that referenced this pull request Apr 4, 2025
@kfaraz kfaraz deleted the fix_taskslots_api branch April 4, 2025 15:04
@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented Apr 4, 2025

Thanks for the review, @cryptoe !

@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented Apr 4, 2025

While the Patch fixes the flakiness there is still a race in task slots update logic rite ? As in if we set the compaction config task slots via the overlord API, the coordinator API can override it no ?

Yes, but that is no different from the race condition between two users hitting the same API.

kgyrtkirk pushed a commit that referenced this pull request Apr 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants