feat(planner): power budget enforcement#9684
Draft
kaim-eng wants to merge 1 commit into
Draft
Conversation
This was referenced May 18, 2026
2222dbf to
1d7b243
Compare
d32fbdb to
739bed9
Compare
1d7b243 to
8126b1a
Compare
739bed9 to
7be3a27
Compare
Adds _apply_power_budget() to PlannerStateMachine, wired into both load-scaling and throughput-scaling paths (one-line call site each). Budget enforcement runs only when enable_power_awareness=True and total_gpu_power_limit is set, so it is a no-op on existing deployments. Includes a no-upscale-decode invariant in _apply_global_budget: when proportional_clamp_pair (from main's budget helper, see PR #9296) would allocate decode beyond original demand because prefill is the binding constraint, decode is clamped back to min(num_d, min_endpoint). This preserves the original behavioural contract from the pre-rebase inline formula while delegating to main's shared clamp. 68-line TestApplyGlobalBudgetNoUpscale suite (5 tests) pins the no-upscale invariant. Part of the PR #9369 split (PR 2 of 6). See docs/design-docs/pr9369-split-plan.md. Signed-off-by: Kai Ma <kaim@nvidia.com>
8126b1a to
f7f3226
Compare
7be3a27 to
0c9050c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of the PR #9369 split plan.
This is PR 3 of 6 (PR 2 — Power Budget Enforcement). Held in Draft per plan §4.5 until PR 1a + PR 1b are approved.
Predecessor: #9683 — Planner power infrastructure
Successor: #9685 (Draft) — AIC closed-loop optimizer
Scope
Budget clamping works, opt-in via config. Adds
_apply_power_budget()to the state machine, wired into both load- and throughput-scaling paths. Includes the decode-no-upscale clamp in_apply_global_budget()(3-line post-proportional_clamp_pairinvariant — see plan §2.3 for the post-rebase mechanic note).~1,200 lines · 4 files. Budget is only active when
enable_power_awareness=Trueandtotal_gpu_power_limitis set.core/state_machine.py—_apply_power_budget()method (66 LOC) + decode-no-upscale clamp in_apply_global_budget()core/load_scaling.py/core/throughput_scaling.py— 1-line call sitestests/unit/test_state_machine.py—TestApplyGlobalBudgetNoUpscaleclass (5 tests, 68 LOC)Reviewer onboarding
docs/design-docs/powerplanner-design.md§6.1 (budget clamping), §6.2 (decode-no-upscale invariant)Tests at this tip (projection per plan §4.4 — will re-measure on dev pod before marking Ready for Review)
TestApplyGlobalBudgetNoUpscale— 5 passedMerge strategy
Rebase-and-merge (no squash). Mark Ready for Review only after both PR 1a and PR 1b have approvals, and rebase onto whichever predecessor lands first.