Make the HTTP client that lives beneath fabric8 KubernetesClient configurable#18540
Make the HTTP client that lives beneath fabric8 KubernetesClient configurable#18540gianm merged 20 commits intoapache:masterfrom
Conversation
| private static final String RUNNERSTRATEGY_PROPERTIES_FORMAT_STRING = K8SANDWORKER_PROPERTIES_PREFIX | ||
| + ".runnerStrategy.%s"; | ||
| private static final String HTTPCLIENT_PROPERITES_PREFIX = K8SANDWORKER_PROPERTIES_PREFIX + ".http"; | ||
| private static final String HTTPCLIENT_TYPE_PROPERTY = K8SANDWORKER_PROPERTIES_PREFIX + ".http.httpClientType"; |
There was a problem hiding this comment.
You haven't added documentation for this parameter, which I think is good because I think it's a parameter that we may remove "soon". I suggest noting that in a comment here. Its purpose is to help cluster operators evaluate different http clients with fabric8, and we're adding it because we've noticed issues with two of the major ones:
- the old default (okhttp) can lead to excessive numbers of threads if you have a lot of tasks running
- the new default (vert.x) has been observed to generate spurious API call failures, which leads to seemingly-random task failures
Ideally, through some evaluation we will determine which one is best to use and how best to configure it, and at that point we could remove the property. It will help us slim down the distribution since it won't need to ship all 3 fabric8 client modules.
There was a problem hiding this comment.
I may or may not have been working on a commit to add docs after I realized I forgot 😆 but we can leave un-documented I guess. But maybe putting it in the extensions doc under an experimental label plus disclaimer that this is temporary while the community determines best path forward for a generic config that works well for all is good compromise. Then all users of the extension could help find the best path forward if they so choose.
There was a problem hiding this comment.
That'd also be OK, I just don't want to create a reason to worry about removing it in the future. If the doc page says that it may be removed in a future release then that works.
|
why is the jdk not recommend for java 11? Another point is that the type name |
https://github.com/fabric8io/kubernetes-client/tree/main/httpclient-jdk just reading from their README they mentioned java16+ having fixes that some use cases need. Did not evaluate deeply since Druid is pending drop of java11 support due to jetty upgrade anyways. |
Thanks for the info.
|
Also enforce proper setting of max threads for okhttp
|
|
||
| |Property| Possible Values |Description| Default |required| | ||
| |--------|-----------------|-----------|---------|--------| | ||
| |`druid.indexer.runner.k8sAndWorker.http.httpClientType`|`String` (e.g., `okhttp`, `vertx`, `javaStandardHttp`)|Specifies the HTTP client library to be used by the worker task runner for communication with worker nodes.|`vertx`|No| |
There was a problem hiding this comment.
It should be whatever we believe works best for the average user.
There was a problem hiding this comment.
I do not think we are ready to say vert.x has enough problems to remove it from being default. If we still want this option to pick a client in druid35 so the community can work to identify the best long term approach, I'm on board with that. But as of now, I think leaving the default as is is best for druid 35.
This reverts commit 8b40ab7.
|
|
||
| ::: | ||
|
|
||
| The extension uses [fabric8 KubernetesClient](https://github.com/fabric8io/kubernetes-client) to communicate with the Kubernetes API server. This client creates an |
There was a problem hiding this comment.
Since this is area where there are problems and our understanding of them is evolving, I think it would be better to move most of this info to a GitHub issue and link it here. Name the GitHub issue something like "Determine best Kubernetes API client" and put the known issues and their mitigations there. That way, as our understanding evolves, old versions' documentation doesn't grow too stale. The only info we need here is a brief preamble, pointer to the GitHub issue, and the table with the descriptions of the configuration options.
There was a problem hiding this comment.
makes sense, have created #18629 for the tracking and discussion of the issue and removed most of the docs content outside of the configs
|
TY @capistrant! |
…igurable (apache#18540) * Make the httpclient backing fabric8 KubernetesClient pluggable * fix checkstyle * fix licenses * Cleanup prefixes for pluggable http client config * Default okhttp and native jdk http client threadpools to static 20 threads * experimental docs for http client config * Cleanup docs and make native jdk client name more specific * Fix unit tests * fix dependency analyzer * Make okhttp use the custom executor by default and bump its thread count Also enforce proper setting of max threads for okhttp * make native jdk http client configs more robust * fix checkstyle * Flip to okhttp as underlying http client for fabric8 * Revert "Flip to okhttp as underlying http client for fabric8" This reverts commit 8b40ab7. * Turn off custom dispatcher for okhttp * slim down docs for this experimental stuff and point to github issue
…igurable (#18540) * Make the httpclient backing fabric8 KubernetesClient pluggable * fix checkstyle * fix licenses * Cleanup prefixes for pluggable http client config * Default okhttp and native jdk http client threadpools to static 20 threads * experimental docs for http client config * Cleanup docs and make native jdk client name more specific * Fix unit tests * fix dependency analyzer * Make okhttp use the custom executor by default and bump its thread count Also enforce proper setting of max threads for okhttp * make native jdk http client configs more robust * fix checkstyle * Flip to okhttp as underlying http client for fabric8 * Revert "Flip to okhttp as underlying http client for fabric8" This reverts commit 8b40ab7. * Turn off custom dispatcher for okhttp * slim down docs for this experimental stuff and point to github issue
This commit adds non-blocking I/O support for Kubernetes API calls by backporting the Vertx HTTP client from Druid 35. This addresses thread pool exhaustion issues when running many concurrent tasks. Changes: - Add kubernetes-httpclient-vertx dependency (v6.7.2) - Add HttpClientType enum (VERTX default, OKHTTP fallback) - Add DruidKubernetesHttpClientFactory interface - Add DruidKubernetesVertxHttpClientConfig for thread pool configuration - Add DruidKubernetesVertxHttpClientFactory for Vertx instance management - Modify DruidKubernetesClient to accept custom HTTP client factory - Modify KubernetesTaskRunnerConfig with httpClientType and vertxHttpClientConfig - Modify KubernetesOverlordModule to select HTTP client based on config - Fix buildJob() method to accept taskType parameter (pre-existing bug) - Add comprehensive logging for debugging and verification Configuration: - Vertx is enabled by default (no config needed) - Fallback: druid.indexer.runner.httpClientType=OKHTTP - Optional tuning: druid.indexer.runner.vertxHttpClientConfig.* See VERTX_HTTP_CLIENT_BACKPORT.md for full documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Verified implementation against actual upstream code (commit cabada6): - Use VertxOptions.DEFAULT_* constants instead of hardcoded values - Align createVertxInstance() method with upstream exactly - Add TYPE_NAME constant to factory - Remove unnecessary toString() from config - Remove conditional eventLoopPoolSize check (always set like upstream) Keep our additions (not in upstream, but useful): - Logging for debugging and verification - close() method for clean Vertx shutdown Updated documentation with upstream comparison section. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Description
Allow an operator to select which HTTP client they want to use for their instance of fabric8
KubernetesClientHttp Client Selector
druid.indexer.runner.k8sAndWorker.http.httpClientTypeValid values:
vertx(default),okhttp,jdkvertx
Extra configuration stays the same as it was in terms of key names and defaults, but the prefix changes to
druid.indexer.runner.k8sAndWorker.http.vertxokhttp
This is the old client that we used before #17913. Drawbacks are the default unbounded thread pool underpinning the client. This was a big reason behind making the change to vert.x in the first place (in addition to vert.x being the new default for fabric8 KubernetesClient in newer versions).
Extra configuration:
druid.indexer.runner.k8sAndWorker.http.okhttp.useCustomDispatcherExecutortruedruid.indexer.runner.k8sAndWorker.http.okhttp.maxWorkerThreads50(fixed size pool)druid.indexer.runner.k8sAndWorker.http.okhttp.coreWorkerThreads50druid.indexer.runner.k8sAndWorker.http.okhttp.workerThreadKeepAliveTime60(seconds)jdk
This uses the native http client in Java. Not recommended if runtime is java11.
Extra configuration:
druid.indexer.runner.k8sAndWorker.http.jdk.maxWorkerThreads50(fixed size pool)druid.indexer.runner.k8sAndWorker.http.jdk.coreWorkerThreads20druid.indexer.runner.k8sAndWorker.http.jdk.workerThreadKeepAliveTime50(seconds)Release note
Adds new experimental configuration to allow users of the
kubernetes-overlord-extensionsto configure what HTTP client library Fabric8 KubernetesClient uses under the hood to communicate with the k8s server tasks run on. The default remains the same client and config that druid 34 uses. Additional options that are now available areokhttpand a native JDKHttpClient. The default client and config should suffice for most use cases.Key changed/added classes in this PR
KubernetesOverlordModuleDruidKubernetesHttpClientFactoryand each implementation of it.This PR has: