Skip to content

Get task location should be stored on the lifecycle object#14649

Merged
suneet-s merged 7 commits intoapache:masterfrom
georgew5656:getTaskLocationCalls
Jul 25, 2023
Merged

Get task location should be stored on the lifecycle object#14649
suneet-s merged 7 commits intoapache:masterfrom
georgew5656:getTaskLocationCalls

Conversation

@georgew5656
Copy link
Copy Markdown
Contributor

Description

This is a performance optimization for the KubernetesTaskRunner. For larger numbers of tasks (especially if they are streaming), there are a lot of getTaskLocation calls on the task runner (e.g. supervisors make this call periodically). Currenly the K8s task runner has to find the pod by querying k8s and grab the pod IP even though it doesn't change unless the pod dies (in which case the task would have already failed).

I have tried testing this with on the order of 200 supervisor tasks and the druid console starts getting really slow. After this change I am able to go up to at least 1000 tasks without an issue on the overlord (haven't tried more than this).

Release note

Performance optimization for k8s task runner.

Key changed/added classes in this PR
  • KubernetesPeonLifecycle

I considered adding a watcher on the pod instead to be safe but I think that would also cause issues, we can revisit that in the future if we decide we need it.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@suneet-s suneet-s merged commit f742bb7 into apache:master Jul 25, 2023
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
FrankChen021 pushed a commit that referenced this pull request Feb 3, 2025
* Fix issue with long data source names

* Use the regular library

* Save location and tls enabled

* Null out before running

* add another comment
GabrielCWT pushed a commit to GabrielCWT/druid that referenced this pull request Sep 9, 2025
)

* Fix issue with long data source names

* Use the regular library

* Save location and tls enabled

* Null out before running

* add another comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants