Motivation
Druid has provisions for writing an extension that could be used for "node discovery" and "leader election". Such extension coupled with "HTTP" based "segment management" and "remote task runner" could be used to obviate the need of operating a Zookeeper cluster for Druid.
However, no such extension exists and above is just theory.
Proposal is to write an extension based on APIs exposed by Kubernetes API server (which itself is backed by etcd ) which has sufficient primitives to be able to do "node discovery" and "leader election" .
Proposed changes
A new extension, "druid-kubernetes-extensions" , would be added with implementations of various discovery related interfaces such as "DruidNodeDiscoveryProvider", "DruidLeaderSelector", "DruidNodeAnnouncer" etc.
Additionally, since this is first such extension, there might be some changes needed in core as well to enable writing the extension.
Rationale
etcd based extension could also be written, but in many Kubernetes deployments, access to etcd is guarded and users of K8s clusters are not expected to use it. So for users(I happen to be one of those) deploying Druid on kubernetes, it is required to not depend directly on etcd. So, I decided not to pursue that.
it is also possible to have Druid implement one of the consensus algorithms, but that is a lot more work and is very error prone. Consensus algorithms are notoriously hard to implement. In the current use case, that is not strictly needed.
More importantly, on most cloud providers, K8S control plane comes with a fixed cost so using it instead of zookeeper lets you remove 3 zk pods for free and adds to cloud cost savings.
Operational impact
None
Test plan (optional)
I would be testing the extension on my own Druid clusters deployed in Kubernetes.
Motivation
Druid has provisions for writing an extension that could be used for "node discovery" and "leader election". Such extension coupled with "HTTP" based "segment management" and "remote task runner" could be used to obviate the need of operating a Zookeeper cluster for Druid.
However, no such extension exists and above is just theory.
Proposal is to write an extension based on APIs exposed by Kubernetes API server (which itself is backed by etcd ) which has sufficient primitives to be able to do "node discovery" and "leader election" .
Proposed changes
A new extension, "druid-kubernetes-extensions" , would be added with implementations of various discovery related interfaces such as "DruidNodeDiscoveryProvider", "DruidLeaderSelector", "DruidNodeAnnouncer" etc.
Additionally, since this is first such extension, there might be some changes needed in core as well to enable writing the extension.
Rationale
etcd based extension could also be written, but in many Kubernetes deployments, access to etcd is guarded and users of K8s clusters are not expected to use it. So for users(I happen to be one of those) deploying Druid on kubernetes, it is required to not depend directly on etcd. So, I decided not to pursue that.
it is also possible to have Druid implement one of the consensus algorithms, but that is a lot more work and is very error prone. Consensus algorithms are notoriously hard to implement. In the current use case, that is not strictly needed.
More importantly, on most cloud providers, K8S control plane comes with a fixed cost so using it instead of zookeeper lets you remove 3 zk pods for free and adds to cloud cost savings.
Operational impact
None
Test plan (optional)
I would be testing the extension on my own Druid clusters deployed in Kubernetes.