fix(operator/lws): preserve legacy worker pod labels#9738
Conversation
|
👋 Hi sttts! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
| if current.empty() { | ||
| return true | ||
| } | ||
| if requiresV2WorkerGeneration(dgd, current, desired) { |
There was a problem hiding this comment.
adding this here seems strange. workerHashForDCDGeneration should rather change in the right moment and stay equal when on rollout is desired.
WalkthroughThis PR refactors v1/v2 worker-hash compatibility logic in the rolling-update controller by introducing a ChangesEPP v2 Worker Generation Requirements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| current, desired workerGenerationHashes, | ||
| ) string { | ||
| if requiresV2WorkerGeneration(dgd, current, desired) { | ||
| return desired.v2 |
There was a problem hiding this comment.
Returning desired.v2 here also affects unsupported Grove/multinode paths, so their handler can create a new v2-suffixed worker DCD without managed rollout cleanup and leave the old v1 DCD running. Fix: gate this branch to supported managed rollouts or make workerHashesForUnsupportedPathway keep using the compatible v1 hash.
2c23c19 to
e64b8b1
Compare
Signed-off-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
e64b8b1 to
e1264e5
Compare
Signed-off-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
Signed-off-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
Signed-off-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
Signed-off-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
Signed-off-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
Signed-off-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
Signed-off-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
Summary
component-type=workerplussub-component-type=decode|prefill, andcomponent-class=workeris not addedMissing logic / bug
The compatibility decision already exists in
getDCDWorkloadComponentType, but LWS pod templates were not using that decision.generateLeaderWorkerSetrendered LWS pod labels fromGetDCDKubeLabelsdirectly, so an upgraded non-EPP decode/prefill worker could render LWS pods asdecode/prefillwhile the compatibility path still rendered Services for the legacyworkerpod identity.Example of the mismatch this PR prevents:
That is both unnecessary pod-template churn and a selector-safety risk. The fix keeps the rendered non-EPP LWS pod identity stable before spec hashes are compared, without adding hash-specific special cases:
Deployment already used the workload-compatible component type path; Grove has its own legacy selector preservation. This PR adds one table-driven test so Deployment, LWS, and Grove all keep those system labels stable for non-EPP legacy workers.
Tests
Relates to #9700