fix: deterministic rollout order for multiple nodes per NodeType#15
fix: deterministic rollout order for multiple nodes per NodeType#15aruraghuwanshi wants to merge 2 commits intoapache:masterfrom
Conversation
Sort node specs by map key within each NodeType in getNodeSpecsByOrder so rollingDeploy does not flap on Go map iteration. Add unit tests (prove non-determinism pre-fix) and an E2E check for two historical tiers (historicalstier1/2).
|
@aruraghuwanshi I believe the failing
I’d recommend:
I tested this locally and the revised flow behaves as expected: the rollout is triggered, tier1 updates first, tier2 waits, and the test completes successfully without hitting the |
|
@AdheipSingh can you take a look at this PR? |
workloadAnnotations only touch StatefulSet object metadata, not the pod template, so updateRevision never changes and the test times out at 900s. Switch to podAnnotations (which flow into PodTemplateSpec), add trap-based cleanup, and fail fast if tier1 never picks up a new revision.
|
Thanks @razinbouzar for the insights. Does seem to be the core issue. I've pushed another commit fixing that. Lets see. |
|
@abhishekrb19 or @AdheipSingh can you kick off the test workflow and review? |
Summary
getNodeSpecsByOrdergroups node specs byNodeTypebut previously appended them fromfor key, nodeSpec := range m.Spec.Nodes. In Go, map iteration order is not stable, so the relative order of multiple specs with the sameNodeTypecould change between reconciles.With
rollingDeploy: true, the handler walks that ordered list and may return early while a workload is still rolling. If the intra–NodeTypeorder flips between calls, the operator can effectively start or advance rollouts for more than one StatefulSet/Deployment of the sameNodeTypeat a time, instead of finishing one before the next.This PR sorts specs by their map key (ascending) within each
NodeTypebefore building the final list, while keeping the existing cross–NodeTypeorder fromdruidServicesOrderunchanged.What changed
getNodeSpecsByOrder(controllers/druid/ordering.go):sort.Sliceon each per–NodeTypeslice byServiceGroup.key.controllers/druid/ordering_test.go): Ginkgo test data uses multiple historical tiers (historicalstier1–3); addedtesting.Ttests that would fail on the pre-fix map-only ordering and pass with stable sorting, plus a guard for cross–NodeTypeorder.e2e/configs/druid-rolling-deploy-cr.yaml,e2e/test-rolling-deploy-ordering.sh, wired frome2e/e2e.sh): two historical tiers (historicalstier1/historicalstier2) withrollingDeploy: true, patch to trigger a rollout, and checks that only one of the two historical StatefulSets is mid-update at a time, with lexicographically first tier finishing before the second starts (when transitions are observable at the poll interval).Testing
rollingDeploywith multiple nodes perNodeType).getNodeSpecsByOrderdocument why sorting is required (kept short).Release note (suggested)
Druid Operator: When
rollingDeployis enabled, rollout order for multiple StatefulSets/Deployments that share the sameNodeType(e.g.historicalstier1andhistoricalstier2) is now stable (sorted by node spec key). That avoids concurrent rollouts within the sameNodeTypecaused by non-deterministic map iteration.This is especially helpful if these two teirs are holding segment replicas across (1 in each tier). Both historicals getting rolled out causes the Druid cluster to have partial unavailability today.
Key changed/added files
controllers/druid/ordering.go— sort within eachNodeTypeby spec keycontrollers/druid/ordering_test.go— Ginkgo +testing.Tcoveragecontrollers/druid/testdata/ordering.yaml— fixture with multiple historical tierse2e/configs/druid-rolling-deploy-cr.yaml— rolling-deploy E2E CRe2e/test-rolling-deploy-ordering.sh— E2E scripte2e/e2e.sh— invoke the new E2E testFixes #XXXX.