Create a new KubernetesPeonClient that uses fabric8 informers to reduce load on an underlying k8s API by capistrant · Pull Request #18599 · apache/druid

capistrant · 2025-10-06T00:07:32Z

Description

KubernetesPeonClient Extension

CachingKubernetesPeonClient experimental

Replaces direct read-only k8s api calls with the use of SharedInformers from fabric8 along with a new event notifier system to centralize all of the state watching for pods and jobs, greatly reducing the api call rate to k8s by the indexing service peon lifecycle

The goal is to make the k8s task runner much more efficient in its use of the k8s control plane API. The existing client has many per pod/job actions that hit the kubernetes api, which can put undue stress on the k8s cluster in high task count/churn druid use cases

Diagram

this is now out of date. I will try to update

Metrics

Using a test cluster with a synthetic workload over a fixed time window I measured the following reductions in API traffic from Druid to the k8s control plane (approximate numbers as of now):

get requests down 35%
list requests down 88%
watches down ~100% (99.82% was the actual measurement I got)

Release note

Adds an experimental implementation of KubernetesPeonClient that utilizes fabric8 SharedInformers to cache k8s metadata and greatly reduce API traffic between the Overlord and k8s control plane. This is experimental and opt-in via configuration.

Key changed/added classes in this PR

AbstractKubernetesPeonClient
DirectKubernetesPeonClient
CachingKubernetesPeonClient
DruidKubernetesClient
KubernetesResourceEventNotifier

This PR has:

…hing mode for the k8s client

…raffic

capistrant · 2025-12-06T00:37:16Z

@kfaraz thank you for the detailed review. I pushed up changes that I made up until now. I am still trying to solve for the issue regarding leaving dangling futures in exceptional cases. Will hopefully have a proposal for fixing that in the next 24 hours. Thanks again, appreciate all your insight on how to improve this.

  private final KubernetesClient client;

-  public TestKubernetesClient(KubernetesClient client)
+  public TestKubernetesClient(KubernetesClient client, String namespace)


kfaraz

Thanks for the update, @capistrant !
The changes look good.

The blocking comments are only these:

#18599 (comment) (is DruidKubernetestCachingClient.stop() called upon loss of leadership? Should we also have a start() which is called on becoming leader?)
#18599 (comment) (when do we call podInformer.run() in DruidKubernetesCachingClient?

The other comments are all minor suggestions.

kfaraz · 2025-12-08T16:33:33Z

+   *
+   * @param <T> The Kubernetes resource type (e.g., Pod, Job)
+   */
+  public static class InformerEventHandler<T> implements ResourceEventHandler<T>


Nit: this class may be in a file of its own.

kfaraz · 2025-12-08T16:33:56Z

+      eventConsumer.accept(resource, InformerEventType.DELETE);
+    }
+  }
+  private static final EmittingLogger log = new EmittingLogger(DruidKubernetesCachingClient.class);


Nit: please add a newline before this.

kfaraz · 2025-12-08T16:34:28Z

+  protected final SharedIndexInformer<Pod> podInformer;
+  protected final SharedIndexInformer<Job> jobInformer;
+  protected final KubernetesResourceEventNotifier eventNotifier;


Do these need to be protected or can they be private too?

kfaraz · 2025-12-08T16:36:13Z

+    this.jobInformer = setupJobInformer(namespace);
+  }
+
+  public void stop()


Please add a short javadoc. Should this be called upon loss of leadership too or only service termination?

Should there also be an equivalent start() that is invoked on becoming leader?

kfaraz · 2025-12-08T16:41:21Z

+
+  public <T> T readPodCache(SharedInformerCacheReadRequestExecutor<T, Pod> executor)
+  {
+    if (podInformer == null) {


podInformer and jobInformer will never be null.

kfaraz · 2025-12-08T17:13:26Z

    }
    catch (Exception e) {
-      log.error(e, "Error watching logs from task: %s", taskId);
+      log.error(e, "Error watching logs from task: %s, pod: %s", taskId, podName);


Suggested change

log.error(e, "Error watching logs from task: %s, pod: %s", taskId, podName);

log.error(e, "Error watching logs from task[%s], pod[%s].", taskId, podName);

kfaraz · 2025-12-08T17:14:22Z

+   * Get an InputStream for the logs of the peon pod associated with the given taskId.
+   * <p>
+   * Any issues creating the InputStream will be logged and an absent Optional will be returned.
+   * </p>
+   *


Nit: I think this part can be omitted since the same info is already captured in @return tag too.

kfaraz · 2025-12-08T17:15:28Z

+   * Any issues creating the InputStream will be logged and an absent Optional will be returned.
+   * </p>
+   *
+   * @return an Optional containing the {@link InputStream} if the pod exists and logs could be streamed, or absent otherwise


Suggested change

* @return an Optional containing the {@link InputStream} if the pod exists and logs could be streamed, or absent otherwise

* @return an Optional containing the {@link InputStream} for the logs of the pod, if it exists and logs could be streamed, or absent otherwise.

kfaraz · 2025-12-08T17:20:13Z

      return this;
    }

+    public Builder withEnablePeonClientCache(boolean enableKubernetesClientCaching)


Please align the method name and arg name with the config i.e. useK8sSharedInformers.

kfaraz · 2025-12-08T17:24:08Z

+  boolean isUseK8sSharedInformers();
+
+  Period getK8sSharedInformerResyncPeriod();


Maybe add 1-line javadocs for these.

capistrant · 2025-12-08T18:19:06Z

Thanks for the update, @capistrant ! The changes look good.

The blocking comments are only these:

#18599 (comment) (is DruidKubernetestCachingClient.stop() called upon loss of leadership? Should we also have a start() which is called on becoming leader?)

#18599 (comment) (when do we call podInformer.run() in DruidKubernetesCachingClient?

The other comments are all minor suggestions.

Thanks @kfaraz for another review round. I think that these questions are both inter-mingled actually.

How everything works now, the Informers start automatically when DruidKubernetesCachingClient is created. This is happening as a part of the overlord startup, regardless of leadership. The start() calls on the informers in the tests were benign and I removed them.

The above information leads to your other question on when to call stop. Right now we only call stop as a part of the JVM lifecycle and not on loss of leadership. This is because as currently constructed the informers are started and exist for the lifecycle of the jvm so they are active regardless of if an OL is leader.

I think the key question is, do we want the informer/caching client to be tied to the JVM lifecycle as they are now or tied to leadership state. doing the latter would reduce resource required by the standby in exchange for lengthening failover time (I'm not exactly sure on statistics here regarding how long failover would be extended by. It would scale with the pod count in k8s, but I don't have any good estimate on what that looks like for a k8s cluster with 500 pods, 5000 pods, 50000 pods, etc)

kfaraz · 2025-12-08T18:24:44Z

Thanks for the clarification, @capistrant !

I think it is better to have the cache pre-warmed on the follower Overlord, so that it is able to quickly take over in case of a leader failure. So let's keep the lifecycle of the cache tied to the Overlord service itself.

kfaraz

Major changes look good. There are minor non-blocking suggestions, which may be addressed here or in a follow up.

Thanks for adding the feature, @capistrant !

capistrant added 5 commits October 3, 2025 09:16

app code working but needs cleanup and testing

47dce70

caching side cleaner. need to add back direct client

e4fbdf5

Implementation ready for deeper UT and ET writing

8911e1f

Merge branch 'master' into k8s-overlord-api-redux

0a10243

checkstyle cleanup

ce7493a

capistrant marked this pull request as draft October 6, 2025 00:07

github-actions Bot added the Kubernetes label Oct 6, 2025

github-advanced-security AI found potential problems Oct 6, 2025

View reviewed changes

Comment thread ...rlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerConfig.java Fixed

capistrant added 3 commits October 7, 2025 19:08

Remove the busy waiting. Overhaul caching client testing

8f13cda

some config and instantion cleanup along with basic docs

3778883

extnd the K8s task runner docker test to run with both direct and cac…

5dcc64d

…hing mode for the k8s client

github-actions Bot added the Area - Documentation label Oct 8, 2025

Merge branch 'master' into k8s-overlord-api-redux

a1b1381

capistrant changed the title ~~[WIP] Create a new KubernetesPeonClient that uses fabric8 informers to reduce load on an underlying k8s API~~ Create a new KubernetesPeonClient that uses fabric8 informers to reduce load on an underlying k8s API Oct 8, 2025

fix spelling and add resync to dictionary

af306e1

capistrant marked this pull request as ready for review October 8, 2025 19:13

github-advanced-security AI found potential problems Oct 8, 2025

View reviewed changes

Comment thread ...rlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerConfig.java Fixed

Comment thread ...rlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerConfig.java Fixed

capistrant added 2 commits October 8, 2025 14:43

fix strict compile issues

08764c5

fixup checkstyle

a90b2d2

capistrant added the Release Notes label Oct 8, 2025

capistrant added 10 commits October 8, 2025 16:19

fix k8s overlord module setup

0d9672c

Merge branch 'master' into k8s-overlord-api-redux

2f24623

few small fixups

1fe0f87

fix checkstyle

e9b3947

fix up some issues with wait for job completion

d62bbc7

cleanup and fix some tests

5f6920b

Make DruidKubernetesClient defend against invalid use if caching is off

5b6c8f3

cleanup checkstyle

11db5ad

dont use deprecated method

144dd49

doc update

7fcbcec

kfaraz and others added 14 commits December 5, 2025 10:02

Remove more formatting changes

6ea97f4

Address the more minor review comments

ceb10bc

re-add log watch refactors to KubernetesPeonClient, they reduce API t…

f447b7a

…raffic

migrate timers to stopwatch in caching k8s client per review comments

5653132

Remove unused code

aa8e12e

style fix

8df9dfe

remove unneeded code

ae29963

Extract Caching client code from DruidKubernetesClient per review

66f3cda

Make name for cache read methods more logical

f99673c

Stop exposing the EventNotifier in DruidKubernetesCachingClient

8ec81cf

Improve informer executor name per review

78e1c82

Simplify informer setup for caching client

ed671f9

cleanup caching client tests and add a lifecycle stop to the informers

6b58a29

Improve thread safety of KubernetesResourceEventNotifier

62ddcc3

capistrant added 3 commits December 7, 2025 21:02

Simply the peon waiting code for the caching client

e58be26

Merge branch 'master' into k8s-overlord-api-redux

94bfa2a

Fix the k8s overlord module for the caching client

4b7a408

github-advanced-security AI found potential problems Dec 8, 2025

View reviewed changes

capistrant added 3 commits December 7, 2025 22:55

fix configs for docker embedded test

ef93c44

fix broken embedded tests

53d0976

use the indexer not informer for cache reads

b2982c3

kfaraz reviewed Dec 8, 2025

View reviewed changes

kfaraz approved these changes Dec 8, 2025

View reviewed changes

Cleanup after another review round

4f735a4

capistrant merged commit 69505a3 into apache:master Dec 8, 2025
99 of 100 checks passed

kfaraz mentioned this pull request Dec 19, 2025

Promote druid-kubernetes-extensions out of experimental status #12904

Open

kgyrtkirk added this to the 36.0.0 milestone Jan 19, 2026

	log.error(e, "Error watching logs from task: %s, pod: %s", taskId, podName);
	log.error(e, "Error watching logs from task[%s], pod[%s].", taskId, podName);

	* @return an Optional containing the {@link InputStream} if the pod exists and logs could be streamed, or absent otherwise
	* @return an Optional containing the {@link InputStream} for the logs of the pod, if it exists and logs could be streamed, or absent otherwise.

		boolean isUseK8sSharedInformers();

		Period getK8sSharedInformerResyncPeriod();

Conversation

capistrant commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

KubernetesPeonClient Extension

CachingKubernetesPeonClient experimental

Diagram

Metrics

Release note

Key changed/added classes in this PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

capistrant commented Dec 6, 2025

Uh oh!

Check notice

kfaraz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

capistrant commented Dec 8, 2025

Uh oh!

kfaraz commented Dec 8, 2025

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

capistrant commented Oct 6, 2025 •

edited

Loading

kfaraz left a comment •

edited

Loading