fix: deterministic rollout order for multiple nodes per NodeType by aruraghuwanshi · Pull Request #15 · apache/druid-operator

aruraghuwanshi · 2026-04-23T06:19:56Z

Summary

getNodeSpecsByOrder groups node specs by NodeType but previously appended them from for key, nodeSpec := range m.Spec.Nodes. In Go, map iteration order is not stable, so the relative order of multiple specs with the same NodeType could change between reconciles.

With rollingDeploy: true, the handler walks that ordered list and may return early while a workload is still rolling. If the intra–NodeType order flips between calls, the operator can effectively start or advance rollouts for more than one StatefulSet/Deployment of the same NodeType at a time, instead of finishing one before the next.

This PR sorts specs by their map key (ascending) within each NodeType before building the final list, while keeping the existing cross–NodeType order from druidServicesOrder unchanged.

What changed

getNodeSpecsByOrder (controllers/druid/ordering.go): sort.Slice on each per–NodeType slice by ServiceGroup.key.
Unit tests (controllers/druid/ordering_test.go): Ginkgo test data uses multiple historical tiers (historicalstier1–3); added testing.T tests that would fail on the pre-fix map-only ordering and pass with stable sorting, plus a guard for cross–NodeType order.
E2E (e2e/configs/druid-rolling-deploy-cr.yaml, e2e/test-rolling-deploy-ordering.sh, wired from e2e/e2e.sh): two historical tiers (historicalstier1 / historicalstier2) with rollingDeploy: true, patch to trigger a rollout, and checks that only one of the two historical StatefulSets is mid-update at a time, with lexicographically first tier finishing before the second starts (when transitions are observable at the poll interval).

Testing

Tested on a real Kubernetes cluster
Tested for backward compatibility on an existing cluster (no API/schema change; behavior change is ordering only, which is the intended fix for rollingDeploy with multiple nodes per NodeType).
Comments in getNodeSpecsByOrder document why sorting is required (kept short).
User-facing docs (e.g. operator docs) — not added; release note text below is suitable for release notes if the project uses them.

Release note (suggested)

Druid Operator: When rollingDeploy is enabled, rollout order for multiple StatefulSets/Deployments that share the same NodeType (e.g. historicalstier1 and historicalstier2) is now stable (sorted by node spec key). That avoids concurrent rollouts within the same NodeType caused by non-deterministic map iteration.

This is especially helpful if these two teirs are holding segment replicas across (1 in each tier). Both historicals getting rolled out causes the Druid cluster to have partial unavailability today.

Key changed/added files

controllers/druid/ordering.go — sort within each NodeType by spec key
controllers/druid/ordering_test.go — Ginkgo + testing.T coverage
controllers/druid/testdata/ordering.yaml — fixture with multiple historical tiers
e2e/configs/druid-rolling-deploy-cr.yaml — rolling-deploy E2E CR
e2e/test-rolling-deploy-ordering.sh — E2E script
e2e/e2e.sh — invoke the new E2E test

Fixes #XXXX.

Sort node specs by map key within each NodeType in getNodeSpecsByOrder so rollingDeploy does not flap on Go map iteration. Add unit tests (prove non-determinism pre-fix) and an E2E check for two historical tiers (historicalstier1/2).

razinbouzar · 2026-04-23T18:52:53Z

@aruraghuwanshi I believe the failing 900s timeout is caused by the test’s rollout trigger, not by the deterministic ordering change itself.

e2e/test-rolling-deploy-ordering.sh currently patches spec.nodes.historicalstier{1,2}.workloadAnnotations and assumes that this forces both StatefulSets to roll. In practice, that only changes StatefulSet object metadata, not the pod template, so the StatefulSet updateRevision never changes and the script eventually times out.

I’d recommend:

patch podAnnotations instead of workloadAnnotations to force a real StatefulSet revision change
fail early if tier1 never picks up a new revision
assert the ordering directly via revision progression:
- tier1 revision changes first
- tier2 revision is still unchanged at that moment
- tier1 rollout completes
- tier2 revision changes only afterwards
use trap-based cleanup so failed runs do not leak test resources

I tested this locally and the revised flow behaves as expected: the rollout is triggered, tier1 updates first, tier2 waits, and the test completes successfully without hitting the 900s timeout.

razinbouzar · 2026-04-23T20:11:07Z

@AdheipSingh can you take a look at this PR?

workloadAnnotations only touch StatefulSet object metadata, not the pod template, so updateRevision never changes and the test times out at 900s. Switch to podAnnotations (which flow into PodTemplateSpec), add trap-based cleanup, and fail fast if tier1 never picks up a new revision.

aruraghuwanshi · 2026-04-24T23:33:40Z

Thanks @razinbouzar for the insights. Does seem to be the core issue. I've pushed another commit fixing that. Lets see.

razinbouzar · 2026-04-25T00:00:31Z

@abhishekrb19 or @AdheipSingh can you kick off the test workflow and review?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: deterministic rollout order for multiple nodes per NodeType#15

fix: deterministic rollout order for multiple nodes per NodeType#15
aruraghuwanshi wants to merge 2 commits intoapache:masterfrom
aruraghuwanshi:deterministic-rollout-ordering

aruraghuwanshi commented Apr 23, 2026 •

edited

Loading

Uh oh!

razinbouzar commented Apr 23, 2026

Uh oh!

razinbouzar commented Apr 23, 2026

Uh oh!

aruraghuwanshi commented Apr 24, 2026

Uh oh!

razinbouzar commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aruraghuwanshi commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Testing

Release note (suggested)

Key changed/added files

Uh oh!

razinbouzar commented Apr 23, 2026

Uh oh!

razinbouzar commented Apr 23, 2026

Uh oh!

aruraghuwanshi commented Apr 24, 2026

Uh oh!

razinbouzar commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aruraghuwanshi commented Apr 23, 2026 •

edited

Loading