feat: add profiler job overrides#6607
Conversation
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
WalkthroughThe pull request refactors profiling job override handling in a Kubernetes operator. It extracts overrides into a separate variable from the DGDR spec and introduces comprehensive merge logic via new helper functions to apply user-provided overrides to profiling job specifications while protecting controller-owned labels. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
deploy/operator/internal/controller/profiling_job_overrides.go (1)
166-185: Consider deduplicatingEnvFromsources or documenting append-only behavior.
EnvFromis appended unconditionally (line 183), which could lead to duplicateConfigMapReforSecretRefentries if the user specifies the same source that already exists. UnlikeEnvandVolumeMountswhich usemergeNamedSlicefor deduplication,EnvFromlacks a unique key.If deduplication is desired, you could key by the
ConfigMapRef.NameorSecretRef.Name. Otherwise, a brief doc comment clarifying the append-only semantics would help.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/operator/internal/controller/profiling_job_overrides.go` around lines 166 - 185, The applyContainerOverrides function currently appends overrides.EnvFrom into container.EnvFrom without deduplication, which can produce duplicate ConfigMapRef/SecretRef entries; update applyContainerOverrides to deduplicate EnvFrom entries (keying by ConfigMapRef.Name and SecretRef.Name) or, if append-only behavior is intentional, add a clarifying doc comment above applyContainerOverrides stating that EnvFrom is intentionally appended and may contain duplicates; reference the applyContainerOverrides function and the EnvFrom field and mirror the mergeNamedSlice approach used for Env and VolumeMounts when implementing dedupe.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@deploy/operator/internal/controller/profiling_job_overrides.go`:
- Around line 166-185: The applyContainerOverrides function currently appends
overrides.EnvFrom into container.EnvFrom without deduplication, which can
produce duplicate ConfigMapRef/SecretRef entries; update applyContainerOverrides
to deduplicate EnvFrom entries (keying by ConfigMapRef.Name and SecretRef.Name)
or, if append-only behavior is intentional, add a clarifying doc comment above
applyContainerOverrides stating that EnvFrom is intentionally appended and may
contain duplicates; reference the applyContainerOverrides function and the
EnvFrom field and mirror the mergeNamedSlice approach used for Env and
VolumeMounts when implementing dedupe.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deploy/operator/internal/controller/dynamographdeploymentrequest_controller.godeploy/operator/internal/controller/profiling_job_overrides.godeploy/operator/internal/controller/profiling_job_overrides_test.go
There was a problem hiding this comment.
🧹 Nitpick comments (2)
deploy/operator/internal/controller/profiling_job_overrides.go (2)
128-164: Tolerations and NodeSelector use replace semantics while other fields merge.This is a valid design choice (replace is simpler and avoids complex merge conflicts), but ensure this is documented for users. If a user wants to add a toleration alongside existing ones, the current behavior requires them to include all desired tolerations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/operator/internal/controller/profiling_job_overrides.go` around lines 128 - 164, The applyPodSpecOverrides function currently replaces spec.Tolerations and spec.NodeSelector rather than merging them; update the function documentation (the comment above applyPodSpecOverrides) to explicitly state that Tolerations and NodeSelector use replace semantics (users must provide the full desired list to add entries), and add a brief mention of the merge behavior for other fields (ImagePullSecrets, Volumes, InitContainers, Containers) so callers know which fields are merged vs replaced; reference applyPodSpecOverrides, spec.Tolerations, and spec.NodeSelector in the comment and also add a note in any higher-level API docs or README describing this replace-vs-merge behavior.
166-185: Container resources replacement loses base values when only one side is specified.Line 172-174: When override has only
Limitsbut noRequests(or vice versa), the entireResourcesstruct is replaced, discarding any baseRequests. This might be unexpected if a user only wants to add limits without clearing existing requests.♻️ Optional: Merge resources at the map level
- if len(overrides.Resources.Requests) > 0 || len(overrides.Resources.Limits) > 0 { - container.Resources = overrides.Resources - } + if len(overrides.Resources.Requests) > 0 { + if container.Resources.Requests == nil { + container.Resources.Requests = make(corev1.ResourceList) + } + for k, v := range overrides.Resources.Requests { + container.Resources.Requests[k] = v + } + } + if len(overrides.Resources.Limits) > 0 { + if container.Resources.Limits == nil { + container.Resources.Limits = make(corev1.ResourceList) + } + for k, v := range overrides.Resources.Limits { + container.Resources.Limits[k] = v + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/operator/internal/controller/profiling_job_overrides.go` around lines 166 - 185, The applyContainerOverrides function replaces the entire corev1.ResourceRequirements when overrides.Resources is non-empty, which discards base values if only Requests or only Limits are provided; change applyContainerOverrides so that it merges overrides.Resources into container.Resources at the map level: for each key in overrides.Resources.Requests and overrides.Resources.Limits set/overwrite container.Resources.Requests[key] and container.Resources.Limits[key], and only replace the whole struct if container.Resources is nil; reference applyContainerOverrides, container.Resources and overrides.Resources when making this change and ensure you preserve existing request/limit entries not present in overrides.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@deploy/operator/internal/controller/profiling_job_overrides.go`:
- Around line 128-164: The applyPodSpecOverrides function currently replaces
spec.Tolerations and spec.NodeSelector rather than merging them; update the
function documentation (the comment above applyPodSpecOverrides) to explicitly
state that Tolerations and NodeSelector use replace semantics (users must
provide the full desired list to add entries), and add a brief mention of the
merge behavior for other fields (ImagePullSecrets, Volumes, InitContainers,
Containers) so callers know which fields are merged vs replaced; reference
applyPodSpecOverrides, spec.Tolerations, and spec.NodeSelector in the comment
and also add a note in any higher-level API docs or README describing this
replace-vs-merge behavior.
- Around line 166-185: The applyContainerOverrides function replaces the entire
corev1.ResourceRequirements when overrides.Resources is non-empty, which
discards base values if only Requests or only Limits are provided; change
applyContainerOverrides so that it merges overrides.Resources into
container.Resources at the map level: for each key in
overrides.Resources.Requests and overrides.Resources.Limits set/overwrite
container.Resources.Requests[key] and container.Resources.Limits[key], and only
replace the whole struct if container.Resources is nil; reference
applyContainerOverrides, container.Resources and overrides.Resources when making
this change and ensure you preserve existing request/limit entries not present
in overrides.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deploy/operator/internal/controller/dynamographdeploymentrequest_controller.godeploy/operator/internal/controller/profiling_job_overrides.godeploy/operator/internal/controller/profiling_job_overrides_test.go
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Overview:
add profiler job overrides
closes #6524
Summary by CodeRabbit
Bug Fixes
Tests