Skip to content

Make the HTTP client that lives beneath fabric8 KubernetesClient configurable#18540

Merged
gianm merged 20 commits intoapache:masterfrom
capistrant:k8s-client-plugable
Oct 15, 2025
Merged

Make the HTTP client that lives beneath fabric8 KubernetesClient configurable#18540
gianm merged 20 commits intoapache:masterfrom
capistrant:k8s-client-plugable

Conversation

@capistrant
Copy link
Copy Markdown
Contributor

@capistrant capistrant commented Sep 16, 2025

Description

Allow an operator to select which HTTP client they want to use for their instance of fabric8 KubernetesClient

Http Client Selector

druid.indexer.runner.k8sAndWorker.http.httpClientType

Valid values: vertx (default), okhttp, jdk

why no jetty?

  • As of now no jetty http client support due to jetty dependency version clashing. It could be possible to look into this in the future though.
vertx

Extra configuration stays the same as it was in terms of key names and defaults, but the prefix changes to druid.indexer.runner.k8sAndWorker.http.vertx

okhttp

This is the old client that we used before #17913. Drawbacks are the default unbounded thread pool underpinning the client. This was a big reason behind making the change to vert.x in the first place (in addition to vert.x being the new default for fabric8 KubernetesClient in newer versions).

Extra configuration:

  • druid.indexer.runner.k8sAndWorker.http.okhttp.useCustomDispatcherExecutor
    • Whether or not you want to override the default unbounded dispatcher.
    • default = true
  • druid.indexer.runner.k8sAndWorker.http.okhttp.maxWorkerThreads
    • The upper bound on the thread pool if you use the custom dispatcher
    • default 50 (fixed size pool)
  • druid.indexer.runner.k8sAndWorker.http.okhttp.coreWorkerThreads
    • The lower bound on the thread pool size if you use the custom dispatcher
    • default = 50
  • druid.indexer.runner.k8sAndWorker.http.okhttp.workerThreadKeepAliveTime
    • default = 60 (seconds)
    • how long idle threads will live for the thread pool if you use the custom dispatcher
jdk

This uses the native http client in Java. Not recommended if runtime is java11.

Extra configuration:

  • druid.indexer.runner.k8sAndWorker.http.jdk.maxWorkerThreads
    • The upper bound on the thread pool
    • default 50 (fixed size pool)
  • druid.indexer.runner.k8sAndWorker.http.jdk.coreWorkerThreads
    • The lower bound on the thread pool size
    • default = 20
  • druid.indexer.runner.k8sAndWorker.http.jdk.workerThreadKeepAliveTime
    • default = 50 (seconds)
    • how long idle threads will live for the thread pool.

Release note

Adds new experimental configuration to allow users of the kubernetes-overlord-extensions to configure what HTTP client library Fabric8 KubernetesClient uses under the hood to communicate with the k8s server tasks run on. The default remains the same client and config that druid 34 uses. Additional options that are now available are okhttp and a native JDK HttpClient. The default client and config should suffice for most use cases.


Key changed/added classes in this PR
  • KubernetesOverlordModule
  • DruidKubernetesHttpClientFactory and each implementation of it.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

private static final String RUNNERSTRATEGY_PROPERTIES_FORMAT_STRING = K8SANDWORKER_PROPERTIES_PREFIX
+ ".runnerStrategy.%s";
private static final String HTTPCLIENT_PROPERITES_PREFIX = K8SANDWORKER_PROPERTIES_PREFIX + ".http";
private static final String HTTPCLIENT_TYPE_PROPERTY = K8SANDWORKER_PROPERTIES_PREFIX + ".http.httpClientType";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't added documentation for this parameter, which I think is good because I think it's a parameter that we may remove "soon". I suggest noting that in a comment here. Its purpose is to help cluster operators evaluate different http clients with fabric8, and we're adding it because we've noticed issues with two of the major ones:

  • the old default (okhttp) can lead to excessive numbers of threads if you have a lot of tasks running
  • the new default (vert.x) has been observed to generate spurious API call failures, which leads to seemingly-random task failures

Ideally, through some evaluation we will determine which one is best to use and how best to configure it, and at that point we could remove the property. It will help us slim down the distribution since it won't need to ship all 3 fabric8 client modules.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may or may not have been working on a commit to add docs after I realized I forgot 😆 but we can leave un-documented I guess. But maybe putting it in the extensions doc under an experimental label plus disclaimer that this is temporary while the community determines best path forward for a generic config that works well for all is good compromise. Then all users of the extension could help find the best path forward if they so choose.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd also be OK, I just don't want to create a reason to worry about removing it in the future. If the doc page says that it may be removed in a future release then that works.

@FrankChen021
Copy link
Copy Markdown
Member

why is the jdk not recommend for java 11?

Another point is that the type name jdk is ambiguous, people may think it's the http client shipped in the sun.net.www.http package.

@capistrant
Copy link
Copy Markdown
Contributor Author

why is the jdk not recommend for java 11?

Another point is that the type name jdk is ambiguous, people may think it's the http client shipped in the sun.net.www.http package.

https://github.com/fabric8io/kubernetes-client/tree/main/httpclient-jdk just reading from their README they mentioned java16+ having fixes that some use cases need. Did not evaluate deeply since Druid is pending drop of java11 support due to jetty upgrade anyways.

@FrankChen021
Copy link
Copy Markdown
Member

why is the jdk not recommend for java 11?
Another point is that the type name jdk is ambiguous, people may think it's the http client shipped in the sun.net.www.http package.

https://github.com/fabric8io/kubernetes-client/tree/main/httpclient-jdk just reading from their README they mentioned java16+ having fixes that some use cases need. Did not evaluate deeply since Druid is pending drop of java11 support due to jetty upgrade anyways.

Thanks for the info.
From the link, I notice that the problem is related to web sockets, not the normal http request. So I think it's not our case here.

It will not work with websocket requests containing queries with encoded characters until the fix openjdk/jdk@c07ce7e which is available in Java 16 - https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8245245


|Property| Possible Values |Description| Default |required|
|--------|-----------------|-----------|---------|--------|
|`druid.indexer.runner.k8sAndWorker.http.httpClientType`|`String` (e.g., `okhttp`, `vertx`, `javaStandardHttp`)|Specifies the HTTP client library to be used by the worker task runner for communication with worker nodes.|`vertx`|No|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default be okhttp.
cc @gianm ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be whatever we believe works best for the average user.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we are ready to say vert.x has enough problems to remove it from being default. If we still want this option to pick a client in druid35 so the community can work to identify the best long term approach, I'm on board with that. But as of now, I think leaving the default as is is best for druid 35.

@cecemei cecemei added this to the 35.0.0 milestone Oct 7, 2025
@capistrant capistrant requested a review from cryptoe October 7, 2025 23:21

:::

The extension uses [fabric8 KubernetesClient](https://github.com/fabric8io/kubernetes-client) to communicate with the Kubernetes API server. This client creates an
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is area where there are problems and our understanding of them is evolving, I think it would be better to move most of this info to a GitHub issue and link it here. Name the GitHub issue something like "Determine best Kubernetes API client" and put the known issues and their mitigations there. That way, as our understanding evolves, old versions' documentation doesn't grow too stale. The only info we need here is a brief preamble, pointer to the GitHub issue, and the table with the descriptions of the configuration options.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, have created #18629 for the tracking and discussion of the issue and removed most of the docs content outside of the configs

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 15, 2025

TY @capistrant!

@gianm gianm merged commit ccb386e into apache:master Oct 15, 2025
130 of 132 checks passed
cecemei pushed a commit to cecemei/druid that referenced this pull request Oct 21, 2025
…igurable (apache#18540)

* Make the httpclient backing fabric8 KubernetesClient pluggable

* fix checkstyle

* fix licenses

* Cleanup prefixes for pluggable http client config

* Default okhttp and native jdk http client threadpools to static 20 threads

* experimental docs for http client config

* Cleanup docs and make native jdk client name more specific

* Fix unit tests

* fix dependency analyzer

* Make okhttp use the custom executor by default and bump its thread count

Also enforce proper setting of max threads for okhttp

* make native jdk http client configs more robust

* fix checkstyle

* Flip to okhttp as underlying http client for fabric8

* Revert "Flip to okhttp as underlying http client for fabric8"

This reverts commit 8b40ab7.

* Turn off custom dispatcher for okhttp

* slim down docs for this experimental stuff and point to github issue
cecemei pushed a commit that referenced this pull request Oct 21, 2025
…igurable (#18540)

* Make the httpclient backing fabric8 KubernetesClient pluggable

* fix checkstyle

* fix licenses

* Cleanup prefixes for pluggable http client config

* Default okhttp and native jdk http client threadpools to static 20 threads

* experimental docs for http client config

* Cleanup docs and make native jdk client name more specific

* Fix unit tests

* fix dependency analyzer

* Make okhttp use the custom executor by default and bump its thread count

Also enforce proper setting of max threads for okhttp

* make native jdk http client configs more robust

* fix checkstyle

* Flip to okhttp as underlying http client for fabric8

* Revert "Flip to okhttp as underlying http client for fabric8"

This reverts commit 8b40ab7.

* Turn off custom dispatcher for okhttp

* slim down docs for this experimental stuff and point to github issue
RonShub added a commit to singular-labs/druid that referenced this pull request Feb 4, 2026
This commit adds non-blocking I/O support for Kubernetes API calls by
backporting the Vertx HTTP client from Druid 35. This addresses thread
pool exhaustion issues when running many concurrent tasks.

Changes:
- Add kubernetes-httpclient-vertx dependency (v6.7.2)
- Add HttpClientType enum (VERTX default, OKHTTP fallback)
- Add DruidKubernetesHttpClientFactory interface
- Add DruidKubernetesVertxHttpClientConfig for thread pool configuration
- Add DruidKubernetesVertxHttpClientFactory for Vertx instance management
- Modify DruidKubernetesClient to accept custom HTTP client factory
- Modify KubernetesTaskRunnerConfig with httpClientType and vertxHttpClientConfig
- Modify KubernetesOverlordModule to select HTTP client based on config
- Fix buildJob() method to accept taskType parameter (pre-existing bug)
- Add comprehensive logging for debugging and verification

Configuration:
- Vertx is enabled by default (no config needed)
- Fallback: druid.indexer.runner.httpClientType=OKHTTP
- Optional tuning: druid.indexer.runner.vertxHttpClientConfig.*

See VERTX_HTTP_CLIENT_BACKPORT.md for full documentation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
RonShub added a commit to singular-labs/druid that referenced this pull request Feb 4, 2026
Verified implementation against actual upstream code (commit cabada6):
- Use VertxOptions.DEFAULT_* constants instead of hardcoded values
- Align createVertxInstance() method with upstream exactly
- Add TYPE_NAME constant to factory
- Remove unnecessary toString() from config
- Remove conditional eventLoopPoolSize check (always set like upstream)

Keep our additions (not in upstream, but useful):
- Logging for debugging and verification
- close() method for clean Vertx shutdown

Updated documentation with upstream comparison section.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants