Skip to content

feat(planner): power budget enforcement#9684

Draft
kaim-eng wants to merge 1 commit into
pr1b/planner-infrafrom
pr2/power-budget
Draft

feat(planner): power budget enforcement#9684
kaim-eng wants to merge 1 commit into
pr1b/planner-infrafrom
pr2/power-budget

Conversation

@kaim-eng
Copy link
Copy Markdown

@kaim-eng kaim-eng commented May 18, 2026

Part of the PR #9369 split plan.
This is PR 3 of 6 (PR 2 — Power Budget Enforcement). Held in Draft per plan §4.5 until PR 1a + PR 1b are approved.

Predecessor: #9683 — Planner power infrastructure
Successor: #9685 (Draft) — AIC closed-loop optimizer

Scope

Budget clamping works, opt-in via config. Adds _apply_power_budget() to the state machine, wired into both load- and throughput-scaling paths. Includes the decode-no-upscale clamp in _apply_global_budget() (3-line post-proportional_clamp_pair invariant — see plan §2.3 for the post-rebase mechanic note).

~1,200 lines · 4 files. Budget is only active when enable_power_awareness=True and total_gpu_power_limit is set.

  • core/state_machine.py_apply_power_budget() method (66 LOC) + decode-no-upscale clamp in _apply_global_budget()
  • core/load_scaling.py / core/throughput_scaling.py — 1-line call sites
  • tests/unit/test_state_machine.pyTestApplyGlobalBudgetNoUpscale class (5 tests, 68 LOC)

Reviewer onboarding

  • Design context (PR 5): docs/design-docs/powerplanner-design.md §6.1 (budget clamping), §6.2 (decode-no-upscale invariant)
  • Plan section: §2.3 (this PR)

Tests at this tip (projection per plan §4.4 — will re-measure on dev pod before marking Ready for Review)

  • All PR 1a + 1b tests still pass
  • TestApplyGlobalBudgetNoUpscale — 5 passed
  • Plan projection: ~559 passed, 1 skipped (planner) + 43 (power_agent) = ~602 total

Merge strategy

Rebase-and-merge (no squash). Mark Ready for Review only after both PR 1a and PR 1b have approvals, and rebase onto whichever predecessor lands first.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Adds _apply_power_budget() to PlannerStateMachine, wired into both
load-scaling and throughput-scaling paths (one-line call site each).
Budget enforcement runs only when enable_power_awareness=True and
total_gpu_power_limit is set, so it is a no-op on existing deployments.

Includes a no-upscale-decode invariant in _apply_global_budget: when
proportional_clamp_pair (from main's budget helper, see PR #9296) would
allocate decode beyond original demand because prefill is the binding
constraint, decode is clamped back to min(num_d, min_endpoint). This
preserves the original behavioural contract from the pre-rebase inline
formula while delegating to main's shared clamp.

68-line TestApplyGlobalBudgetNoUpscale suite (5 tests) pins the
no-upscale invariant.

Part of the PR #9369 split (PR 2 of 6). See docs/design-docs/pr9369-split-plan.md.

Signed-off-by: Kai Ma <kaim@nvidia.com>
@kaim-eng kaim-eng force-pushed the pr1b/planner-infra branch from 8126b1a to f7f3226 Compare May 19, 2026 15:54
@kaim-eng kaim-eng force-pushed the pr2/power-budget branch from 7be3a27 to 0c9050c Compare May 19, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant