fix: decouple Helm chart from runtime cluster state for helm template and GitOps compatibility#6754
Conversation
WalkthroughThis PR updates the Dynamo Platform Helm chart with kai-scheduler dependency upgraded to v0.13.0-rc1, introduces global configuration flags for kai-scheduler and grove installation/integration, refactors operator template conditional logic to support tri-state metrics configuration (null/true/false), simplifies kai.yaml resource creation, and updates documentation to reflect these architectural changes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deploy/helm/charts/platform/templates/kai.yaml (1)
22-23:⚠️ Potential issue | 🟠 MajorQueue hooks lack explicit delete policy; will recreate on upgrade.
Both Queue resources (lines 22-23 and 44-45) use
helm.sh/hook: post-install,post-upgradewithout specifyinghelm.sh/hook-delete-policy. Helm defaults tobefore-hook-creation, which deletes the Queue resources right before recreating them on each upgrade. Add an explicit policy (e.g.,hook-succeeded,before-hook-creation) to make the intended lifecycle clear and avoid unintended queue recreation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/helm/charts/platform/templates/kai.yaml` around lines 22 - 23, The Queue resources have helm hook annotations ("helm.sh/hook": post-install,post-upgrade and "helm.sh/hook-weight": "100") but lack an explicit deletion policy, causing Helm to use the default before-hook-creation behavior; update both Queue resource annotations to include "helm.sh/hook-delete-policy": "hook-succeeded,before-hook-creation" (or another explicit policy you prefer) so the hook lifecycle is explicit and queues are not unintentionally recreated. Ensure you modify the Queue resource blocks in the template that contain the existing "helm.sh/hook" annotations and add the new "helm.sh/hook-delete-policy" annotation alongside them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@deploy/helm/charts/platform/components/operator/templates/operator-servicemonitor.yaml`:
- Around line 15-18: The Helm template uses Go-template block comments like {{-
/* ... */}} which break YAML parsing; replace the block comment surrounding the
tri-state explanation for metricsService.enabled with standard YAML comments
(prefix lines with #) so the text becomes regular YAML comments rather than
template tokens; update the comment around the tri-state explanation for
metricsService.enabled (the block currently using {{- /* ... */}}) in
operator-servicemonitor.yaml to use # lines instead.
In `@deploy/helm/charts/platform/README.md.gotmpl`:
- Around line 133-136: Update the compatibility matrix row for "kai-scheduler"
to explicitly include the tested pre-release by changing the version floor from
">= v0.13.0" to a range or floor that includes the RC (for example ">=
v0.13.0-rc1" or a bounded range like ">= v0.13.0-rc1, < v0.14.0") so the
README's table entry for kai-scheduler reflects that v0.13.0-rc1 is supported.
In `@docs/pages/kubernetes/installation-guide.md`:
- Around line 163-166: Update the version floor in the installation matrix so
prereleases are included: replace the ">= v0.13.0" entry for the kai-scheduler
column in the table row containing "dynamo-platform | kai-scheduler | Grove"
with a semver-compatible floor such as ">= v0.13.0-0" (or explicitly ">=
v0.13.0-rc1" if you only want to allow that RC), ensuring the documented minimum
matches the chart's tested versions.
---
Outside diff comments:
In `@deploy/helm/charts/platform/templates/kai.yaml`:
- Around line 22-23: The Queue resources have helm hook annotations
("helm.sh/hook": post-install,post-upgrade and "helm.sh/hook-weight": "100") but
lack an explicit deletion policy, causing Helm to use the default
before-hook-creation behavior; update both Queue resource annotations to include
"helm.sh/hook-delete-policy": "hook-succeeded,before-hook-creation" (or another
explicit policy you prefer) so the hook lifecycle is explicit and queues are not
unintentionally recreated. Ensure you modify the Queue resource blocks in the
template that contain the existing "helm.sh/hook" annotations and add the new
"helm.sh/hook-delete-policy" annotation alongside them.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
deploy/helm/charts/platform/Chart.yamldeploy/helm/charts/platform/README.mddeploy/helm/charts/platform/README.md.gotmpldeploy/helm/charts/platform/components/operator/templates/metrics-service.yamldeploy/helm/charts/platform/components/operator/templates/operator-config.yamldeploy/helm/charts/platform/components/operator/templates/operator-servicemonitor.yamldeploy/helm/charts/platform/components/operator/templates/prometheus.yamldeploy/helm/charts/platform/components/operator/values.yamldeploy/helm/charts/platform/templates/kai.yamldeploy/helm/charts/platform/values.yamldocs/pages/kubernetes/installation-guide.md
|
@coderabbitai, regarding you comment about
|
|
Tip For best results, initiate chat on the files or code changes.
I've noted the correct behavior from the Helm documentation you cited. Thank you for catching that and providing the reference! 🐰💡 ✏️ Learnings added
🧠 Learnings used |
|
@coderabbitai, actually i was wrong ... https://helm.sh/docs/topics/charts_hooks/#hook-deletion-policies |
|
Tip For best results, initiate chat on the files or code changes.
According to the official Helm documentation, when no hook deletion policy annotation is specified, the before-hook-creation behavior applies by default. This means Helm will delete the previous hook resource before creating a new one, which makes hook executions idempotent by default. I appreciate you taking the time to double-check the documentation and correct the record. I'll update my knowledge accordingly! ✏️ Learnings added
⛔ Learnings removed🧠 Learnings used |
… and GitOps compatibility Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
808530c to
b3ef8b6
Compare
… and GitOps compatibility Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
… and GitOps compatibility (#6754) Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
… and GitOps compatibility (ai-dynamo#6754) Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Summary
Changes
Kai Scheduler queue creation (kai.yaml)
Subchart install vs integration flags
Operator config (operator-config.yaml)
Monitoring resources
Summary by CodeRabbit
Release Notes
New Features
Configuration Updates
Documentation