[feature][autoscaling] add Spot scheduling feature and update RBAC#2957
[feature][autoscaling] add Spot scheduling feature and update RBAC#2957AlexanderYastrebov wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b87a451c73
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if clusterEnabled { | ||
| pr = append(pr, []rbacv1.PolicyRule{ | ||
| { | ||
| // Patch workloads to write spot-disabled-until annotation during on-demand fallback | ||
| APIGroups: []string{rbac.AppsAPIGroup}, |
There was a problem hiding this comment.
Scope spot RBAC rule behind spot feature flag
This rule is added whenever clusterEnabled is true, so users who enable only features.autoscaling.cluster.enabled now get extra apps patch access even though spot scheduling is documented as a separate sub-feature. That broadens cluster-agent privileges for non-spot deployments and breaks the intended least-privilege gating of cluster.spot.enabled; moving this rule to the clusterSpotEnabled block would keep permissions aligned with the enabled feature set.
Useful? React with 👍 / 👎.
b87a451 to
7852bcf
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2957 +/- ##
==========================================
+ Coverage 41.38% 41.40% +0.02%
==========================================
Files 327 327
Lines 28952 28962 +10
==========================================
+ Hits 11982 11992 +10
Misses 16109 16109
Partials 861 861
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
This comment has been minimized.
This comment has been minimized.
207ec23 to
17f0f7e
Compare
| Enabled *bool `json:"enabled,omitempty"` | ||
|
|
||
| // Spot contains the configuration for the spot instance scheduling sub-feature. | ||
| // Requires cluster autoscaling to be enabled. |
There was a problem hiding this comment.
Do we want this condition? In the agent spot scheduling does not depend on cluster scaling although its a nested feature.
What does this PR do? --------------------- Adds `features.autoscaling.cluster.spot.enabled` to the DatadogAgent API, which grants the cluster-agent permission to patch Deployments/StatefulSets with the `spot-disabled-until` annotation and evict pending spot pods during on-demand fallback. Spot is enforced as a sub-feature of cluster autoscaling (requires `cluster.enabled=true`). Motivation ---------- Support spot instance scheduling in the operator so that users can enable the spot scheduler via the DatadogAgent CR without manually managing RBAC. Additional Notes ---------------- Spot scheduler implementation: DataDog/datadog-agent#47429 Updates https://datadoghq.atlassian.net/browse/CASCL-1312 Minimum Agent Versions ---------------------- * Cluster Agent: v7.79.0 Describe your test plan ----------------------- Unit tests cover the new RBAC rules for spot scheduling. E2E tests in DataDog/datadog-agent validate spot scheduling behaviour end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
17f0f7e to
58dacda
Compare
| } | ||
|
|
||
| if clusterEnabled { | ||
| if f.workloadEnabled || f.clusterSpotEnabled { |
There was a problem hiding this comment.
We can split this condition and have duplicate rules as k8s merges them.
What does this PR do?
Adds
features.autoscaling.cluster.spot.enabledto the DatadogAgent API, which grants the cluster-agent permission to patch Deployments/StatefulSets with thespot-disabled-untilannotation and evict pending spot pods during on-demand fallback. Spot is enforced as a sub-feature of cluster autoscaling (requirescluster.enabled=true).Motivation
Support spot instance scheduling in the operator so that users can enable the spot scheduler via the DatadogAgent CR without manually managing RBAC.
Additional Notes
Spot scheduler implementation: DataDog/datadog-agent#47429
Updates https://datadoghq.atlassian.net/browse/CASCL-1312
Minimum Agent Versions
Describe your test plan
Unit tests cover the new RBAC rules for spot scheduling. E2E tests in DataDog/datadog-agent validate spot scheduling behaviour end-to-end.
Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel