feat(power-agent): per-node power-cap enforcement DaemonSet#9682
feat(power-agent): per-node power-cap enforcement DaemonSet#9682kaim-eng wants to merge 3 commits into
Conversation
WalkthroughThis PR introduces a complete Kubernetes DaemonSet agent that enforces per-GPU NVML power caps on worker nodes. The daemon reconciles every 15 seconds, mapping running GPU processes to pod UIDs via cgroup parsing, reading power-limit annotations, resolving multi-pod conflicts, and applying caps via NVML with Prometheus metrics and graceful shutdown support. ChangesPower Agent: GPU NVML Power Cap Enforcement DaemonSet
🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
components/power_agent/power_agent.py (1)
537-539: ⚡ Quick winMove
argparseimport to module scope.Line 538 imports inside
main(), which violates the repo’s Python import placement rule.As per coding guidelines: "Keep all imports at the top of each file (flag any import inside functions/classes)."Proposed fix
+import argparse import json import logging @@ def main() -> None: - import argparse - parser = argparse.ArgumentParser(description="Dynamo Power Agent DaemonSet")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/power_agent/power_agent.py` around lines 537 - 539, The argparse import is inside the main() function; move the import to module scope by adding "import argparse" at the top of the file and removing the in-function import in main(), ensuring any type or usage remains correct (adjust other imports if needed) so linting and the repo rule about top-level imports are satisfied.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/power_agent/power_agent.py`:
- Around line 119-124: The function _load_previously_managed_gpus should
defensively handle malformed or unreadable state files: when reading
_MANAGED_STATE_PATH, catch OSError in addition to
FileNotFoundError/JSONDecodeError, load the JSON into a variable (e.g., data =
json.load(f)), verify isinstance(data, dict) before calling
data.get("managed_uuids", []), and if the root is not a dict (or managed_uuids
is not an iterable of strings) return an empty set; ensure any
TypeError/ValueError from unexpected types is handled and results in returning
set() rather than letting startup crash.
- Around line 294-297: Replace the broad except that swallows errors from the
pynvml.nvmlShutdown() call with a targeted handler: catch pynvml.NVMLError (or
the specific pynvml error class) as e, log the exception with the module/class
logger (e.g., logger.exception("NVML shutdown failed: %s", e) or
logging.exception(...)) and then re-raise the error to avoid silent failures;
modify the try/except around pynvml.nvmlShutdown() accordingly so only
NVML-specific errors are caught, logged, and propagated.
- Around line 494-501: The loop that builds pod_annotations appends one entry
per GPU process, causing pods with multiple processes to be counted multiple
times; change the logic in the procs loop (using _extract_pod_uid_from_cgroup
and uid_to_annotation) to deduplicate by pod UID before applying multi-pod
policy — e.g., track seen UIDs (or build a uid->annotation map) and only append
one (uid, annotation) pair per unique UID so each pod is counted once when
evaluating multi-pod warnings/metrics.
In `@deploy/power_agent/daemonset.yaml`:
- Around line 16-35: The DaemonSet metadata.namespace is hardcoded to "default",
which causes mismatches with the RBAC/service account namespace; update the
manifest to use the parameterized namespace variable (e.g.
${POWER_AGENT_NAMESPACE}) instead of "default" and ensure the ServiceAccount
referenced by spec.serviceAccountName ("dynamo-power-agent") is created/bound in
that same parameterized namespace so RBAC bindings resolve correctly; locate
metadata.name ("dynamo-power-agent"), metadata.namespace, and
spec.serviceAccountName in the template to make the change.
- Around line 55-58: Replace the mutable image tag
"nvcr.io/nvidia/dynamo/power-agent:latest" with an immutable reference (a
specific version tag or an image digest), e.g.
"nvcr.io/nvidia/dynamo/power-agent:vX.Y.Z" or
"nvcr.io/nvidia/dynamo/power-agent@sha256:<digest>", so deployments are
reproducible; keep or verify imagePullPolicy (e.g. IfNotPresent) is appropriate
for pinned images and update the inline comment to note the image is
intentionally pinned.
---
Nitpick comments:
In `@components/power_agent/power_agent.py`:
- Around line 537-539: The argparse import is inside the main() function; move
the import to module scope by adding "import argparse" at the top of the file
and removing the in-function import in main(), ensuring any type or usage
remains correct (adjust other imports if needed) so linting and the repo rule
about top-level imports are satisfied.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f9faddf8-30e0-4a7b-9055-b45244b2a979
📒 Files selected for processing (11)
.github/filters.yamlcomponents/power_agent/README.mdcomponents/power_agent/power_agent.pycomponents/power_agent/tests/__init__.pycomponents/power_agent/tests/test_apply_cap.pycomponents/power_agent/tests/test_cgroup_parser.pycomponents/power_agent/tests/test_multi_pod_policy.pycomponents/power_agent/tests/test_shutdown.pydeploy/power_agent/daemonset.yamldeploy/power_agent/dev-pod.yamldeploy/power_agent/rbac.yaml
| def _load_previously_managed_gpus() -> set[str]: | ||
| try: | ||
| with open(_MANAGED_STATE_PATH) as f: | ||
| return set(json.load(f).get("managed_uuids", [])) | ||
| except (FileNotFoundError, json.JSONDecodeError): | ||
| return set() |
There was a problem hiding this comment.
Harden managed-state parsing to avoid startup crashes on malformed file content.
Line 122 assumes the JSON root is a dict; if the file contains a list/string/corrupt schema, .get(...) raises and can abort startup. Also unreadable-file OSError cases (e.g., permissions) are currently uncaught.
Proposed fix
def _load_previously_managed_gpus() -> set[str]:
try:
with open(_MANAGED_STATE_PATH) as f:
- return set(json.load(f).get("managed_uuids", []))
- except (FileNotFoundError, json.JSONDecodeError):
+ payload = json.load(f)
+ except (OSError, json.JSONDecodeError) as e:
+ logger.warning("Failed to read managed GPU state file: %s", e)
return set()
+
+ if not isinstance(payload, dict):
+ logger.warning("Managed GPU state file has invalid root type: %s", type(payload))
+ return set()
+ managed = payload.get("managed_uuids", [])
+ if not isinstance(managed, list):
+ logger.warning("managed_uuids must be a list, got: %s", type(managed))
+ return set()
+ return {u for u in managed if isinstance(u, str)}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/power_agent/power_agent.py` around lines 119 - 124, The function
_load_previously_managed_gpus should defensively handle malformed or unreadable
state files: when reading _MANAGED_STATE_PATH, catch OSError in addition to
FileNotFoundError/JSONDecodeError, load the JSON into a variable (e.g., data =
json.load(f)), verify isinstance(data, dict) before calling
data.get("managed_uuids", []), and if the root is not a dict (or managed_uuids
is not an iterable of strings) return an empty set; ensure any
TypeError/ValueError from unexpected types is handled and results in returning
set() rather than letting startup crash.
| try: | ||
| pynvml.nvmlShutdown() | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Don’t silently swallow NVML shutdown failures.
Line 296 catches all exceptions and drops them. This hides shutdown problems and complicates recovery/debugging.
Proposed fix
try:
pynvml.nvmlShutdown()
- except Exception:
- pass
+ except Exception as e:
+ logger.warning("nvmlShutdown failed during SIGTERM handling: %s", e)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| pynvml.nvmlShutdown() | |
| except Exception: | |
| pass | |
| try: | |
| pynvml.nvmlShutdown() | |
| except Exception as e: | |
| logger.warning("nvmlShutdown failed during SIGTERM handling: %s", e) | |
| raise |
🧰 Tools
🪛 Ruff (0.15.12)
[error] 296-297: try-except-pass detected, consider logging the exception
(S110)
[warning] 296-296: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/power_agent/power_agent.py` around lines 294 - 297, Replace the
broad except that swallows errors from the pynvml.nvmlShutdown() call with a
targeted handler: catch pynvml.NVMLError (or the specific pynvml error class) as
e, log the exception with the module/class logger (e.g., logger.exception("NVML
shutdown failed: %s", e) or logging.exception(...)) and then re-raise the error
to avoid silent failures; modify the try/except around pynvml.nvmlShutdown()
accordingly so only NVML-specific errors are caught, logged, and propagated.
| pod_annotations: list[tuple[str, Optional[str]]] = [] | ||
| for proc in procs: | ||
| uid = _extract_pod_uid_from_cgroup(proc.pid) | ||
| if uid is None: | ||
| continue # non-K8s process — skip | ||
| if uid in uid_to_annotation: | ||
| pod_annotations.append((uid, uid_to_annotation[uid])) | ||
|
|
There was a problem hiding this comment.
Deduplicate by pod UID before applying multi-pod policy.
Line 500 appends one entry per GPU process. A single pod with multiple processes is then counted as “multi-pod”, skewing warnings/metrics and policy disposition.
Proposed fix
- pod_annotations: list[tuple[str, Optional[str]]] = []
+ pod_annotations_by_uid: dict[str, Optional[str]] = {}
for proc in procs:
uid = _extract_pod_uid_from_cgroup(proc.pid)
if uid is None:
continue # non-K8s process — skip
if uid in uid_to_annotation:
- pod_annotations.append((uid, uid_to_annotation[uid]))
+ pod_annotations_by_uid[uid] = uid_to_annotation[uid]
+
+ pod_annotations = list(pod_annotations_by_uid.items())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pod_annotations: list[tuple[str, Optional[str]]] = [] | |
| for proc in procs: | |
| uid = _extract_pod_uid_from_cgroup(proc.pid) | |
| if uid is None: | |
| continue # non-K8s process — skip | |
| if uid in uid_to_annotation: | |
| pod_annotations.append((uid, uid_to_annotation[uid])) | |
| pod_annotations_by_uid: dict[str, Optional[str]] = {} | |
| for proc in procs: | |
| uid = _extract_pod_uid_from_cgroup(proc.pid) | |
| if uid is None: | |
| continue # non-K8s process — skip | |
| if uid in uid_to_annotation: | |
| pod_annotations_by_uid[uid] = uid_to_annotation[uid] | |
| pod_annotations = list(pod_annotations_by_uid.items()) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/power_agent/power_agent.py` around lines 494 - 501, The loop that
builds pod_annotations appends one entry per GPU process, causing pods with
multiple processes to be counted multiple times; change the logic in the procs
loop (using _extract_pod_uid_from_cgroup and uid_to_annotation) to deduplicate
by pod UID before applying multi-pod policy — e.g., track seen UIDs (or build a
uid->annotation map) and only append one (uid, annotation) pair per unique UID
so each pod is counted once when evaluating multi-pod warnings/metrics.
| metadata: | ||
| name: dynamo-power-agent | ||
| namespace: default | ||
| labels: | ||
| app: dynamo-power-agent | ||
| spec: | ||
| selector: | ||
| matchLabels: | ||
| app: dynamo-power-agent | ||
| updateStrategy: | ||
| type: RollingUpdate | ||
| rollingUpdate: | ||
| maxUnavailable: 1 | ||
| template: | ||
| metadata: | ||
| labels: | ||
| app: dynamo-power-agent | ||
| spec: | ||
| serviceAccountName: dynamo-power-agent | ||
| hostPID: true |
There was a problem hiding this comment.
Parameterize the DaemonSet namespace to prevent RBAC/SA mismatches.
Line 18 hardcodes default, while RBAC is set up to support ${POWER_AGENT_NAMESPACE}. Deploying outside default can break ServiceAccount resolution/binding.
Suggested fix
metadata:
name: dynamo-power-agent
- namespace: default
+ namespace: ${POWER_AGENT_NAMESPACE}
labels:
app: dynamo-power-agent📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| metadata: | |
| name: dynamo-power-agent | |
| namespace: default | |
| labels: | |
| app: dynamo-power-agent | |
| spec: | |
| selector: | |
| matchLabels: | |
| app: dynamo-power-agent | |
| updateStrategy: | |
| type: RollingUpdate | |
| rollingUpdate: | |
| maxUnavailable: 1 | |
| template: | |
| metadata: | |
| labels: | |
| app: dynamo-power-agent | |
| spec: | |
| serviceAccountName: dynamo-power-agent | |
| hostPID: true | |
| metadata: | |
| name: dynamo-power-agent | |
| namespace: ${POWER_AGENT_NAMESPACE} | |
| labels: | |
| app: dynamo-power-agent | |
| spec: | |
| selector: | |
| matchLabels: | |
| app: dynamo-power-agent | |
| updateStrategy: | |
| type: RollingUpdate | |
| rollingUpdate: | |
| maxUnavailable: 1 | |
| template: | |
| metadata: | |
| labels: | |
| app: dynamo-power-agent | |
| spec: | |
| serviceAccountName: dynamo-power-agent | |
| hostPID: true |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/power_agent/daemonset.yaml` around lines 16 - 35, The DaemonSet
metadata.namespace is hardcoded to "default", which causes mismatches with the
RBAC/service account namespace; update the manifest to use the parameterized
namespace variable (e.g. ${POWER_AGENT_NAMESPACE}) instead of "default" and
ensure the ServiceAccount referenced by spec.serviceAccountName
("dynamo-power-agent") is created/bound in that same parameterized namespace so
RBAC bindings resolve correctly; locate metadata.name ("dynamo-power-agent"),
metadata.namespace, and spec.serviceAccountName in the template to make the
change.
| - name: power-agent | ||
| # Replace with your registry/image | ||
| image: nvcr.io/nvidia/dynamo/power-agent:latest | ||
| imagePullPolicy: IfNotPresent |
There was a problem hiding this comment.
Use an immutable image reference instead of latest.
Line 57 uses a mutable tag, which makes rollouts non-reproducible and can introduce unreviewed runtime changes.
Suggested fix
- image: nvcr.io/nvidia/dynamo/power-agent:latest
+ image: nvcr.io/nvidia/dynamo/power-agent:<release-tag>`@sha256`:<digest>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/power_agent/daemonset.yaml` around lines 55 - 58, Replace the mutable
image tag "nvcr.io/nvidia/dynamo/power-agent:latest" with an immutable reference
(a specific version tag or an image digest), e.g.
"nvcr.io/nvidia/dynamo/power-agent:vX.Y.Z" or
"nvcr.io/nvidia/dynamo/power-agent@sha256:<digest>", so deployments are
reproducible; keep or verify imagePullPolicy (e.g. IfNotPresent) is appropriate
for pinned images and update the inline comment to note the image is
intentionally pinned.
3cde449 to
1338028
Compare
| subjects: | ||
| - kind: ServiceAccount | ||
| name: dynamo-power-agent | ||
| namespace: ${POWER_AGENT_NAMESPACE} |
There was a problem hiding this comment.
The manifest cannot be applied as-is because ${POWER_AGENT_NAMESPACE} is not a valid Kubernetes namespace value and kubectl apply -n ... does not substitute fields inside a ClusterRoleBinding. Fix: set a concrete namespace that matches the DaemonSet namespace or template this manifest through Helm/Kustomize/envsubst before shipping it.
| @@ -0,0 +1,577 @@ | |||
| #!/usr/bin/env python3 | |||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
In all your files, you only need these first two SPDX lines. You can delete the rest of the copyright header.
Adds the Power Agent DaemonSet (privileged, hostPID) that watches pods
for the dynamo.nvidia.com/gpu-power-limit annotation, parses
/proc/{pid}/cgroup to recover pod UIDs (handles all K8s QoS x cgroup
driver x CRI runtime variants), and calls
nvmlDeviceSetPowerManagementLimit per physical GPU.
Includes NVML cap clamping to SKU-specific [min_w, max_w] bounds,
UUID-gated orphan-cap restoration on startup, SIGTERM-handler default-
TGP restore, and a multi-pod-per-GPU policy that falls back to
power_agent_safe_default_watts on conflict.
43 unit tests covering apply_cap, cgroup_parser, multi_pod_policy,
shutdown. Standalone component - no coupling to planner internals.
CI filter (.github/filters.yaml) extends the planner group to cover the
new component + deploy paths so changes here trip the planner job set.
Part of the PR #9369 split (PR 1a of 6). See docs/design-docs/pr9369-split-plan.md.
Signed-off-by: Kai Ma <kaim@nvidia.com>
1338028 to
ec21081
Compare
Folds deploy/power_agent/{daemonset,rbac,dev-pod}.yaml into a single
Helm chart at deploy/helm/charts/power-agent/, resolving the three
CodeRabbit findings on PR #9682:
* hardcoded metadata.namespace=default -> {{ .Release.Namespace }}
* mutable image :latest -> required image.tag with fail-fast validator
* `${POWER_AGENT_NAMESPACE}` envsubst placeholder -> native Helm templating
The chart supports three deployment shapes selectable via values:
production DaemonSet (default, cluster-wide RBAC), namespace-restricted
production (Role+RoleBinding), and an in-cluster dev-iteration Pod
mounting power_agent.py from a ConfigMap. Three template-time validators
reject foot-guns at install time: empty image.tag, mutex violations
between daemonset.enabled and dev.enabled, and dev mode without a
pinned dev.nodeName. Dev mode also automatically forces namespace-scoped
RBAC (least privilege), leveraging power_agent.py's --namespace flag.
Design rationale, scope decisions, and review-feedback responses are
captured in docs/design-docs/power-agent-helm-chart-plan.md, committed
alongside the chart.
components/power_agent/README.md flips its install recipe to
``helm install``, and the planner CI filter (.github/filters.yaml) is
retargeted from deploy/power_agent/** to deploy/helm/charts/power-agent/**.
Two examples/deployments/powerplanner/*.yaml header references live on
PR #9687 and will be updated during that PR's cascade rebase per plan
section 5.3.
Validated locally:
helm lint -> 0 errors
helm template (3 positive exercises) -> expected resources
helm template (3 negative exercises) -> expected fail-fast errors
components/power_agent/tests/ -> 43/43 passed
.github/scripts/test-filters.js -> 20/20 passed
pre-commit (cross-cutting hooks) -> all applicable passed
Signed-off-by: Kai Ma <kaim@nvidia.com>
…S.txt copyright-checks (regex requiring `#`-prefixed SPDX lines) rejected the Go-template-comment header in _helpers.tpl and the missing header in NOTES.txt on commit 8e490a7. Switch both to the same 2-line `#`-prefixed form already used by the other 9 chart files, aligning with @grahamking's PR review preference (minimal SPDX, no Apache boilerplate). For NOTES.txt this adds two visible lines to the post-`helm install` banner, matching the project convention established by deploy/helm/charts/platform/templates/NOTES.txt. For _helpers.tpl the header sits outside any `{{- define -}}` block, so it is never invoked at render time and is invisible to consumers despite not being inside a Go template comment. Re-validated locally: helm lint -> 0 errors helm template (default mode) -> 4 expected kinds pre-commit on the two touched files -> all applicable hooks pass Signed-off-by: Kai Ma <kaim@nvidia.com>
Updates the three reference sites in examples/deployments/powerplanner/ that previously instructed users to ``kubectl apply -f deploy/power_agent/...``, flipping them to the new Helm chart at deploy/helm/charts/power-agent/ that landed in PR #9682: * disagg-power-aware.yaml header recipe * README.md Prerequisites + verify section * MULTI_DGD.md file index Also updates the verify-pods label selector from the legacy ``app=dynamo-power-agent`` to the chart-emitted ``app.kubernetes.io/name=power-agent``. Design docs (powerplanner-design.md, power-agent-dcgm-actuator.md, pr9369-split-plan.md, power-agent-helm-chart-plan.md) still reference the old paths in their historical / architectural narrative sections, which is intentional -- those describe the pre-chart state and the transition rationale, not current deployment instructions. Part of the PR #9369 cascade following PR #9682''s Helm chart landing. Signed-off-by: Kai Ma <kaim@nvidia.com>
Updates the three reference sites in examples/deployments/powerplanner/ that previously instructed users to ``kubectl apply -f deploy/power_agent/...``, flipping them to the new Helm chart at deploy/helm/charts/power-agent/ that landed in PR #9682: * disagg-power-aware.yaml header recipe * README.md Prerequisites + verify section * MULTI_DGD.md file index Also updates the verify-pods label selector from the legacy ``app=dynamo-power-agent`` to the chart-emitted ``app.kubernetes.io/name=power-agent``. Design docs (powerplanner-design.md, power-agent-dcgm-actuator.md, pr9369-split-plan.md, power-agent-helm-chart-plan.md) still reference the old paths in their historical / architectural narrative sections, which is intentional -- those describe the pre-chart state and the transition rationale, not current deployment instructions. Part of the PR #9369 cascade following PR #9682''s Helm chart landing. Signed-off-by: Kai Ma <kaim@nvidia.com>
Part of the PR #9369 split plan.
This is PR 1 of 6 (PR 1a — Power Agent).
Predecessor: none (base =
main)Successor: #9683 (depends on this PR via stacked branch
pr1b/planner-infra)Scope
Standalone Power Agent DaemonSet — per-node GPU power-cap enforcement.
~1,500 lines · 10 files · all net-new, zero modifications to existing files.
Zero
import dynamo.planner.*. Reviewable in isolation from the planner.components/power_agent/power_agent.py(529 LOC) — NVML clamp helper, cgroup parser, multi-pod policy, SIGTERM handler, UUID-gated orphan-cap restorationcomponents/power_agent/tests/— 43 unit tests covering apply_cap, cgroup_parser, multi_pod_policy, shutdowndeploy/power_agent/{daemonset,dev-pod,rbac}.yaml— Kubernetes manifests.github/filters.yaml— planner CI filter extended to power_agent pathsReviewer onboarding
docs/design-docs/powerplanner-design.md§7 — per-node power-cap enforcement (lands in PR 5; readable from this branch tip too viagit show pr5/docs-devenv:docs/design-docs/powerplanner-design.md)dynamo.nvidia.com/gpu-power-limitis the only coupling to the plannerTests at this tip
components/power_agent/tests/— 43 passed in 0.73 s on dev pod (2026-05-18)origin/main..pr1a/power-agent: 14 hooks pass, 4 skipped (no matching files).pytest-marker-reportreconfirmed in pod withMissing sets: 0. The Windows-host run shows a known pre-existingfcntlimport error intests/utils/port_utils.py:10(POSIX-only stdlib in an untouched file) — plan §7 acknowledges this; CI runs on Linux are unaffected.Merge strategy
Rebase-and-merge (no squash) — preserves 1 commit in
mainhistory per plan §4.3. Please configure this on the PR before merging.Stacked-PR cascade
Per plan §4.5: this PR is Ready for Review alongside PR 1b (#9683). PRs 2–5 are held in Draft until both foundations approve. The two foundation PRs touch disjoint files; either can land first.