Skip to content

feat(planner): own AIC interpolation; fix MoE-DEP bugs in rapid mode#8335

Merged
tedzhouhk merged 14 commits into
mainfrom
hzhou/planner-owns-aic-interp
Apr 20, 2026
Merged

feat(planner): own AIC interpolation; fix MoE-DEP bugs in rapid mode#8335
tedzhouhk merged 14 commits into
mainfrom
hzhou/planner-owns-aic-interp

Conversation

@tedzhouhk
Copy link
Copy Markdown
Contributor

@tedzhouhk tedzhouhk commented Apr 18, 2026

Summary

Moves AIConfigurator-based interpolation out of the profiler and into the planner for pre_deployment_sweeping_mode: rapid. The profiler now picks configurations and hands them off via a new aic_interpolation block on the planner ConfigMap; the planner runs the AIC sweep in-process at bootstrap and seeds its regression models directly from the synthesised FPMs — no more NPZ round-trip through disk.

This also fixes the 7 silent MoE-DEP correctness issues that made rapid mode crash or quietly mis-model Qwen/Qwen3-235B-A22B-FP8 and other MoE picks using DEP/TEP/hybrid parallelism.

Priority chain for pre-deployment FPMs (highest → lowest):

  1. get_perf_metrics endpoint (live self-benchmark)
  2. AIC interpolation (if aic_interpolation spec is set)
  3. Legacy files under profile_results_dir (thorough mode)

Thorough mode is unchanged — profiler still runs real-GPU interpolation, bakes NPZ into the profile-data ConfigMap, planner reads files. Mocker rapid-mode deployments also unchanged; they keep running the existing AIC-to-NPZ sweep in the profiler.

What changed

New planner modules:

  • planner/config/parallelization.py — home for PickedParallelConfig (moved from profiler) + picked_to_aic_model_config_kwargs helper
  • planner/config/aic_interpolation_spec.py — pydantic AICInterpolationSpec for profiler → planner handoff
  • planner/monitoring/aic_estimator.py — home for AIConfiguratorPerfEstimator (moved from profiler)
  • planner/monitoring/aic_interpolation.pyrun_aic_interpolation() lands the MoE-DEP fixes

Planner wiring:

  • PlannerConfig.aic_interpolation field
  • fetch_pre_deployment_metrics gains priority-2 AIC branch with lazy aiconfigurator import
  • All 4 adapters.py call sites forward aic_spec=self.config.aic_interpolation

Profiler flip:

  • needs_profile_data() returns False for planner-only rapid (mocker path unaffected)
  • dgd_generation.py:build_aic_interpolation_spec() populates the spec; rapid+planner-only skips add_profile_data_to_config
  • profile_sla.py computes sweep_max_context_length once and builds the spec

Back-compat shims for existing profiler importers of PickedParallelConfig and AIConfiguratorPerfEstimator.

MoE-DEP bug fixes (landed in aic_interpolation.py)

# Bug Fix
1 moe_tp_size/moe_ep_size missing → TypeError: None × None Every AIC call goes through picked_to_aic_model_config_kwargs
2 PickedParallelConfig.tp_size (KV-split property) fed to AIC as attention TP Helper uses raw p.tp; property now documented as KV-head-split only
3 attention_dp_size never reaches ModelConfig Explicitly passed in both prefill and decode kwargs
4 Decode RuntimeConfig.batch_size treated as aggregate Now passed per-attention-DP-rank (num_req // dp)
5 get_max_kv_tokens per-rank but downstream math expects aggregate Multiplied by attention_dp_size
6 Decode sweep forced multiples of attention_dp_size Plain linear sweep (DP-routing constraint doesn't apply to AIC's static simulator)
7 FPM conventions drifted between AIC and thorough paths Both paths now emit identical shapes

Test plan

  • New unit tests: 26 tests across test_aic_interpolation.py, test_perf_metrics_priority.py, test_dgd_generation_aic.py
    • TestPickedToAicKwargs — helper correctness across TP/TEP/DEP/hybrid
    • TestRunAicInterpolation — sweeps with mocked estimator; every call asserts MoE kwargs + identity
    • TestPriorityChain — 6 tests covering all fallback paths
    • TestQwen235MoEPicks — end-to-end with the exact Qwen-235B pick shapes
  • Full unit suite: 179 passed, 11 skipped locally (skips are pre-existing dynamo.llm / deploy binding absences in the dev venv)
  • CI: verify the full suite + pre-commit runs green
  • Manual E2E against Qwen/Qwen3-235B-A22B-FP8 DGDR on h100/h200: confirm profiler skips BuildingCurves phase, DGD has only planner-config ConfigMap, planner logs Loaded N FPMs from AIC interpolation
  • Manual E2E with pre_deployment_sweeping_mode: thorough: confirm both ConfigMaps still produced and planner reads files
  • Manual E2E with get_perf_metrics endpoint available: confirm planner short-circuits at priority 1

Migration notes

  • aiconfigurator is imported lazily; if the planner image ships without it, rapid-mode deployments will surface a clear ImportError and fall through to the file loader. Ops will want to ensure the planner image bundles aiconfigurator for rapid-mode deployments.
  • Existing DGDRs without aic_interpolation continue to work unchanged (pydantic default is None).
  • The 10 commits are ordered so CI stays green at every step: types move first (with shims), additions come before wiring, and only the final commit changes observable rapid-mode output.

🤖 Generated with Claude Code


Open with Devin

Summary by CodeRabbit

Release Notes

  • New Features

    • Added in-process performance estimation enabling faster planner bootstrap without relying on external profiling files.
    • Extended planner configuration to support rapid-mode metric generation and interpolation.
  • Tests

    • Added comprehensive test coverage for in-process performance estimation and metric retrieval fallback chains.

tedzhouhk and others added 10 commits April 17, 2026 16:53
PickedParallelConfig is a pure data type describing an AIConfigurator
parallelism pick (tp, pp, dp, moe_tp, moe_ep). Both the profiler (which
picks) and the planner (which will soon consume the pick to bootstrap
perf models) need it, so it now lives under dynamo.planner.config.

The profiler's parallelization_mapping.py keeps its internal types
(ParallelizationMapping, ParallelizationStrategy, and the two helper
functions) and re-exports PickedParallelConfig for back-compat.

Also documents .tp_size as KV-head-split semantics only (it is NOT the
same quantity as AIConfigurator's ModelConfig.tp_size, which is
attention TP per rank). The unused to_parallelization_mapping() method
is dropped; it had zero callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Follow-up to the previous commit: turn the profiler's
parallelization_mapping.py into a shim that re-exports
PickedParallelConfig from its new home while keeping the
profiler-internal types (ParallelizationMapping, ParallelizationStrategy,
get_candidate_parallel_mappings, apply_parallel_mapping_to_config) in
place.

No behavior change; class identity is preserved (verified:
``PickedParallelConfig is planner.config.parallelization.PickedParallelConfig``).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…ring

Relocates the AIC wrapper from the profiler to the planner so the planner
can run AIC-based interpolation at bootstrap in rapid mode. The profiler's
estimate_perf.py becomes a re-export shim for existing callers.

Class identity is preserved — profiler and planner imports point at the
same underlying class, so isinstance checks still work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…g_kwargs

Two additions for the upcoming rapid-mode refactor:

* picked_to_aic_model_config_kwargs(p) returns the AIC ModelConfig
  kwargs for a pick (tp_size, pp_size, moe_tp_size, moe_ep_size,
  attention_dp_size). Uses the raw p.tp / p.dp — NOT the derived
  p.tp_size property, which has KV-head-split semantics that conflict
  with AIC's attention-TP definition and break the MoE parallelism
  identity for DEP picks.

* AICInterpolationSpec is the pydantic schema the profiler will embed
  in the planner ConfigMap in rapid mode, carrying hf_id/system/backend,
  ISL/OSL/max_context_length, granularities, and the two PickedParallelConfig
  picks. Serialises cleanly through model_dump_json.

PickedParallelConfig is now a frozen pydantic BaseModel (previously a
frozen dataclass) to allow clean embedding. No callers rely on
dataclass-specific APIs — verified by grep for asdict/is_dataclass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
The planner can now generate ForwardPassMetrics in-process from an
AICInterpolationSpec by lazy-importing aiconfigurator and sweeping
prefill/decode configurations itself. This replaces the profiler's
NPZ-based AIC path for rapid mode and fixes the MoE-DEP correctness
bugs that silently corrupted the old sweep:

* Every AIC call uses picked_to_aic_model_config_kwargs, so
  moe_tp_size / moe_ep_size / attention_dp_size always reach
  ModelConfig. The MoE parallelism identity
  (tp_size * attention_dp_size == moe_tp * moe_ep) holds by
  construction.
* get_max_kv_tokens is scaled by attention_dp_size to get aggregate
  capacity, matching thorough-mode semantics downstream.
* Decode RuntimeConfig.batch_size is now per-attention-DP-rank
  (aggregate // dp), not the aggregate — AIC internally scales MoE
  compute by attention_dp_size, so passing aggregate double-counted.
* Decode concurrency sweep is linear; the thorough path's
  DP-multiples constraint is a round-robin routing requirement that
  does not apply to AIC's static simulator.

FPM conventions mirror the thorough-mode file loader so bootstrap
data looks identical regardless of source:
* Prefill: (num_prefill_requests=1, sum_prefill_tokens=isl,
  wall_time=per-rank TTFT)
* Decode: (num_decode_requests=aggregate, sum_decode_kv_tokens=
  aggregate * ctx_len, wall_time=per-step ITL)

Also extends the dynamo.common.forward_pass_metrics stub in
test_advisory_mode to expose ScheduledRequestMetrics — the partial
stub masked the real module when test collection ran in certain
orders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
PlannerConfig gains an optional aic_interpolation field and the
bootstrap fetcher gains an aic_spec parameter. The priority chain is
now:

  1. get_perf_metrics endpoint (runtime self-benchmark)
  2. AIC interpolation in-process (if aic_spec is set)
  3. Legacy profile_results_dir files (thorough-mode fallback)

If the endpoint succeeds, AIC and files are never touched. If
aiconfigurator is not installed, the ImportError is caught and the
chain falls through to files. If AIC raises at runtime, same
fallthrough. Only if all three sources fail does the planner raise.

All four fetch_pre_deployment_metrics call sites in adapters.py
(Prefill, Decode, Agg, Disagg × {prefill, decode}) forward the spec
from self.config.aic_interpolation.

PlannerConfig validator warns when enable_throughput_scaling + rapid
mode + no aic_interpolation — this is the handshake the profiler will
satisfy in the final step of the refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
In rapid mode with planner features enabled, the profiler now just picks
configurations and emits an AICInterpolationSpec inside the planner's
ConfigMap. The planner runs the AIC sweep itself at bootstrap via
run_aic_interpolation. No more NPZ round-trip for rapid+planner-only
deployments.

Changes:

* needs_profile_data() returns False for rapid + planner + no mocker.
  Mocker deployments still need NPZ (they read it directly), so mocker
  cases keep running the existing AIC sweep in run_interpolation.
* build_aic_interpolation_spec() in dgd_generation.py constructs the
  spec from the DGDR + picks + workload params. Returns None (falling
  through to the file path) for thorough mode, disabled throughput
  scaling, missing picks, or unsupported backends.
* _build_planner_config / add_planner_to_config / assemble_final_config
  thread aic_spec through, embedding it in the planner ConfigMap.
* profile_sla.py computes sweep_max_context_length once (both rapid
  and thorough need it) and builds the spec before assembling the
  final config.

Thorough-mode behavior is unchanged: profiler runs real-GPU
interpolation, bakes NPZ into the profile-data ConfigMap, planner
reads files at priority 3.

End-to-end for the Qwen-235B-A22B-FP8 DGDR from the triage report:
  rapid + planner → profiler skips BuildingCurves phase → DGD has only
  the planner-config ConfigMap (no planner-profile-data) → planner pod
  runs AIC in-process → "Loaded N FPMs from AIC interpolation" with
  correct MoE-DEP semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Adds TestQwen235BugReproCase with the exact two picks from the original
triage report:

* Prefill: tp=4, dp=1, moe_tp=1, moe_ep=4
* Decode: tp=1, dp=8, moe_tp=2, moe_ep=4

Each test pins down a specific symptom that the old profiler path
produced:

* test_prefill_pick_satisfies_aic_identity — Bug #1 (missing moe kwargs)
  and Bug #2 (using .tp_size property) would break the identity for
  this pick. Verifies all five kwargs and the MoE identity.
* test_decode_pick_satisfies_aic_identity — same for the decode pick,
  which was the exact shape that triggered Bug #3 (missing
  attention_dp_size).
* test_tp_size_property_differs_from_aic_tp_size — direct regression
  guard: documents that the property returns 1 here while AIC's tp_size
  is 4. Anyone who wires .tp_size back into AIC kwargs will break this.
* test_end_to_end_sweep_delivers_complete_kwargs_to_aic — runs both
  sweeps and asserts every AIC call gets complete kwargs. This is the
  direct reproducer of the originally-reported failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Drop the "bug repro" framing. The class now reads as a positive
exercise of a realistic large-MoE pick pair that spans the full AIC
kwarg surface: a DEP-on-MoE prefill pick and a hybrid TEP+DP decode
pick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
black/isort reformatting across the new AIC files; no semantic changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Walkthrough

This PR introduces in-process AIConfigurator (AIC) interpolation support for rapid-mode planner bootstrap. It adds new configuration schemas, an AIC performance estimator wrapper, interpolation sweep logic, and integrates these components into the planner's metrics-fetching fallback chain. It also refactors shared parallelization config to a common location and updates profiler DGD generation to conditionally build and pass AIC specifications.

Changes

Cohort / File(s) Summary
Configuration Schemas
components/src/dynamo/planner/config/aic_interpolation_spec.py, components/src/dynamo/planner/config/parallelization.py
Introduced AICInterpolationSpec to represent the serialized handoff from profiler to planner, and PickedParallelConfig (frozen Pydantic model) with computed properties for GPU counts, effective TP sizing, and label formatting. Includes picked_to_aic_model_config_kwargs() converter for AIC-compatible kwarg generation.
Core AIC Estimator & Interpolation
components/src/dynamo/planner/monitoring/aic_estimator.py, components/src/dynamo/planner/monitoring/aic_interpolation.py
Added AIConfiguratorPerfEstimator wrapper for lazy AIC SDK import, database loading, and perf/KV-cache estimation. Implemented run_aic_interpolation() dispatcher that executes component-specific prefill/decode sweeps, interpolating across sequence lengths, estimating metrics per-rank, and emitting aggregate ForwardPassMetrics.
Planner Integration
components/src/dynamo/planner/config/planner_config.py, components/src/dynamo/planner/core/adapters.py, components/src/dynamo/planner/monitoring/perf_metrics.py
Added optional aic_interpolation field to PlannerConfig. Updated fetch_pre_deployment_metrics() with fallback chain: endpoint → in-process AIC (if spec provided) → legacy file conversion → error. Modified all planner adapters to pass aic_spec parameter to metrics-fetching calls.
Profiler DGD Integration
components/src/dynamo/profiler/profile_sla.py, components/src/dynamo/profiler/utils/dgd_generation.py
Updated profiler to compute sweep_max_context_length once and conditionally build AICInterpolationSpec when disaggregated and not dry-run. Extended assemble_final_config(), add_planner_to_config(), and _build_planner_config() with optional aic_spec parameter; introduced build_aic_interpolation_spec() factory for conditional spec generation.
Refactoring: Shared Parallelization Config
components/src/dynamo/profiler/utils/config_modifiers/parallelization_mapping.py, components/src/dynamo/profiler/utils/estimate_perf.py
Removed local PickedParallelConfig dataclass from parallelization_mapping.py and local AIConfiguratorPerfEstimator from estimate_perf.py; both now import and re-export from planner-owned modules to establish single source of truth.
Config Interpretation Update
components/src/dynamo/profiler/utils/profile_common.py
Modified needs_profile_data() to skip external profiling for rapid-mode planner sweeps, returning False when pre_deployment_sweeping_mode == Rapid and only planner throughput scaling is enabled.
Test Stub Update
components/src/dynamo/planner/tests/unit/test_advisory_mode.py
Extended module-level mock stub to include ScheduledRequestMetrics symbol alongside ForwardPassMetrics.
Comprehensive Unit Tests
components/src/dynamo/planner/tests/unit/test_aic_interpolation.py, components/src/dynamo/planner/tests/unit/test_perf_metrics_priority.py, components/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.py
Added test coverage for picked_to_aic_model_config_kwargs() correctness, AICInterpolationSpec validation, end-to-end run_aic_interpolation() behavior with mocked estimator, fallback chain priority/handling in fetch_pre_deployment_metrics(), and DGD generation with AIC spec conditional logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: moving AIC interpolation to the planner and fixing MoE-DEP bugs in rapid mode.
Description check ✅ Passed The description covers all required template sections: Overview (summary of changes), Details (comprehensive breakdown of what changed and bug fixes), and Related Issues section (no GitHub issues referenced, which is acceptable).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
components/src/dynamo/profiler/profile_sla.py (1)

385-395: Consider narrowing the broad except Exception (BLE001).

get_model_config_from_model_path can raise OSError/FileNotFoundError/KeyError/etc.; a blanket except Exception also swallows programming errors (e.g. TypeError) that would be better surfaced. If the set of expected failure modes is stable, list them explicitly; otherwise keep the catch but log the exception (logger.warning(..., exc_info=True)) so diagnosis isn't lost.

🔧 Proposed fix
-        try:
-            model_cfg = get_model_config_from_model_path(resolve_model_path(dgdr))
-            sweep_max_context_length = model_cfg.get("max_position_embeddings", 0)
-        except Exception:
-            logger.warning("Could not fetch model max context length.")
-            sweep_max_context_length = 0
+        try:
+            model_cfg = get_model_config_from_model_path(resolve_model_path(dgdr))
+            sweep_max_context_length = model_cfg.get("max_position_embeddings", 0)
+        except (OSError, KeyError, ValueError) as e:
+            logger.warning("Could not fetch model max context length: %s", e)
+            sweep_max_context_length = 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/profiler/profile_sla.py` around lines 385 - 395, The
broad except Exception around calling
get_model_config_from_model_path(resolve_model_path(dgdr)) should be narrowed
and the error logged; replace the blanket except with explicit expected
exceptions (e.g., FileNotFoundError, OSError, KeyError, ValueError) and in the
handler call logger.warning with exc_info=True so the stacktrace is preserved,
then keep the existing fallback assignment to sweep_max_context_length (isl * 2
or 8192) unchanged; ensure references to get_model_config_from_model_path,
resolve_model_path, sweep_max_context_length, and logger.warning are updated
accordingly.
components/src/dynamo/planner/config/parallelization.py (1)

18-92: LGTM: clean, well-documented shared abstraction.

A few observations, none blocking:

  • The tp_size property vs AIC's ModelConfig.tp_size semantic conflict is a real foot-gun, and the warning in the docstring plus the dedicated picked_to_aic_model_config_kwargs helper is the right shape to prevent misuse. Consider a short regression test asserting picked_to_aic_model_config_kwargs(p)["tp_size"] == p.tp across a few DEP/TEP/dense picks to lock this in.
  • Ruff flags the ambiguous multiplication sign × in docstrings (RUF002 at lines 68 and 78). If you want to keep CI quiet, replace with * or add an # noqa: RUF002 on the module or a per-file ruff config entry.
  • frozen=True makes instances hashable, which is needed if they end up as dict keys/set members during AIC sweeps — worth confirming the downstream sweep code relies on this.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/config/parallelization.py` around lines 18 -
92, The docstrings use the Unicode multiplication sign "×" which triggers Ruff
RUF002; replace each "×" with " * " (or add a module/file-level "# noqa: RUF002"
if you prefer) in the docstring near the assertion lines, and add a short
regression test that constructs a few PickedParallelConfig instances (DEP:
moe_ep>1, TEP: moe_tp>1, dense: moe_tp=moe_ep=1) and asserts
picked_to_aic_model_config_kwargs(p)["tp_size"] == p.tp to lock in the intended
mapping; also confirm PickedParallelConfig.model_config =
ConfigDict(frozen=True) remains if downstream sweep/set usage requires
hashability.
components/src/dynamo/planner/monitoring/aic_estimator.py (3)

120-135: runtime_config should be a local, not instance state.

self.runtime_config is assigned inside estimate_perf but never read outside this method. Storing it as an attribute introduces hidden state that could mislead a future caller into thinking AIConfiguratorPerfEstimator is a stateful/session object, and it gets overwritten on every call.

♻️ Proposed fix
-        self.runtime_config = aiconfigurator.sdk.config.RuntimeConfig(
+        runtime_config = aiconfigurator.sdk.config.RuntimeConfig(
             batch_size=batch_size,
             beam_width=1,
             isl=isl,
             osl=osl,
         )
@@
         summary = session.run_static(
-            mode=mode_to_aic_mode[mode], runtime_config=self.runtime_config, stride=32
+            mode=mode_to_aic_mode[mode], runtime_config=runtime_config, stride=32
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/monitoring/aic_estimator.py` around lines 120 -
135, The code assigns runtime_config to self.runtime_config inside estimate_perf
but never uses it elsewhere, creating hidden mutable state; change this by
making runtime_config a local variable (e.g., runtime_config =
aiconfigurator.sdk.config.RuntimeConfig(...)) inside the estimate_perf method
instead of self.runtime_config, and update the call to session.run_static to
pass runtime_config (not self.runtime_config). Locate the assignment and the
session.run_static invocation in AIConfiguratorPerfEstimator. Ensure no other
methods reference self.runtime_config and remove the attribute if it's declared
on the class.

15-23: Library modules shouldn't mutate their own logger level or attach handlers.

Calling logger.setLevel(INFO) and logger.addHandler(StreamHandler(...)) at module import time (a) overrides whatever log config the hosting app has chosen and (b) causes duplicate output when the root logger also has handlers. Convention is to let the application configure logging; the library just does logger = logging.getLogger(__name__).

♻️ Proposed fix
 logger = logging.getLogger(__name__)
-logger.setLevel(logging.INFO)
-console_handler = logging.StreamHandler()
-console_handler.setLevel(logging.INFO)
-formatter = logging.Formatter(
-    "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S"
-)
-console_handler.setFormatter(formatter)
-logger.addHandler(console_handler)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/monitoring/aic_estimator.py` around lines 15 -
23, The module currently configures the logger at import time (logger =
logging.getLogger(__name__), logger.setLevel(...), console_handler, formatter,
logger.addHandler(...)), which overrides app logging and can cause duplicate
output; remove the calls that set the level and add the StreamHandler so the
library only does logger = logging.getLogger(__name__), and instead attach a
logging.NullHandler() to the logger (e.g.,
logger.addHandler(logging.NullHandler())) if you need to avoid “No handler”
warnings—leave formatter/console_handler and any setLevel calls out of the
module.

26-34: Verify return aiconfigurator resolves after submodule-only imports.

The function does import aiconfigurator.sdk.<x> five times but never an explicit import aiconfigurator, then returns the bare aiconfigurator name. This works in CPython today because import a.b.c binds a in the local namespace — but the intent is non-obvious. Consider adding import aiconfigurator as the first line (cheap, already cached) to make the binding explicit.

Does `import aiconfigurator.sdk.config` bind the top-level `aiconfigurator` name in the local scope of a function in Python 3?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/monitoring/aic_estimator.py` around lines 26 -
34, The function _try_import_aiconfigurator performs only submodule imports
(aiconfigurator.sdk.*) but returns the bare name aiconfigurator which may not be
explicitly bound in all implementations; update _try_import_aiconfigurator to
first do an explicit import aiconfigurator (before importing
aiconfigurator.sdk.config, aiconfigurator.sdk.models, etc.), then keep the
existing submodule imports and return aiconfigurator so the returned symbol is
unambiguous and guaranteed to be bound in the function scope.
components/src/dynamo/planner/monitoring/aic_interpolation.py (1)

158-161: Document the aggregate→per-rank rounding trade-off.

When num_req_aggregate < attention_dp, batch_size_per_rank clamps to 1, which makes AIC simulate attention_dp requests (one per rank) instead of the num_req_aggregate the FPM records. The FPM still carries the aggregate count, so the regression will learn ITL from an over-provisioned shape at the low end.

Not worth blocking on — the thorough path had a similar approximation — but a one-line comment here saves the next reader the archaeology.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/monitoring/aic_interpolation.py` around lines
158 - 161, Add a one-line comment above the batch_size_per_rank calculation (the
block using max_concurrency_aggregate, spec.decode_interpolation_granularity and
computing batch_size_per_rank = max(1, num_req_aggregate // attention_dp))
explaining the rounding trade-off: when num_req_aggregate < attention_dp we
clamp to 1 which effectively simulates attention_dp requests (one per rank) even
though the FPM records only num_req_aggregate, so the regression will see an
over‑provisioned shape at the low end; mention this is an accepted approximation
consistent with similar logic elsewhere.
components/src/dynamo/planner/tests/unit/test_aic_interpolation.py (1)

216-347: Strong regression coverage for the MoE-DEP fixes.

The identity assertion tp_size * attention_dp_size == moe_tp_size * moe_ep_size baked into _assert_identity plus the explicit "don't derive from p.tp_size property" regression guard (line 146–155) pins the exact invariants the old profiler path violated. Good defensive testing.

Minor nit: line 225 unpacks estimator but never uses it (_patch_estimator(...) already exposes the mock via the second return). Renaming to _ keeps Ruff (RUF059) quiet:

-        ctx_patch, estimator = _patch_estimator(prefill_latency_ms=123.0)
+        ctx_patch, _estimator = _patch_estimator(prefill_latency_ms=123.0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/tests/unit/test_aic_interpolation.py` around
lines 216 - 347, In test_prefill_sweep_shape_and_fpm_convention the tuple from
_patch_estimator is destructured as "ctx_patch, estimator" but estimator is
never used; change the unpacking to "ctx_patch, _ = _patch_estimator(...)" to
silence the unused-variable lint (RUF059) while keeping the mock behavior
intact—locate the call to _patch_estimator inside
test_prefill_sweep_shape_and_fpm_convention and replace the "estimator"
identifier with an underscore.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/planner/monitoring/aic_interpolation.py`:
- Around line 138-155: Add an explicit precondition check at the start of
_sweep_decode to mirror _sweep_prefill: validate that
spec.sweep_max_context_length - osl_sweep > 100 (or whatever minimum window you
use for the decode range) and raise a clear error (ValueError or similar) with a
descriptive message referencing spec.sweep_max_context_length and osl_sweep when
the condition fails; this ensures the for-range over isl (which uses 100 as the
start) cannot silently produce zero iterations and yields a direct, actionable
error instead of the generic "produced only 0 valid points" message.
- Line 67: Move the AIConfiguratorPerfEstimator import out of the function and
place it at the top of the module: remove the function-level import of
AIConfiguratorPerfEstimator in aic_interpolation.py and add a module-level
import "from dynamo.planner.monitoring.aic_estimator import
AIConfiguratorPerfEstimator" near the other imports so the class is imported at
module load (the estimator import is cheap and the optional dependency only
fails on instantiation); ensure you delete the TYPE_CHECKING block or the
in-function import that references AIConfiguratorPerfEstimator so there are no
duplicate or nested imports.

In `@components/src/dynamo/profiler/utils/profile_common.py`:
- Around line 200-210: Hoist the function-local import into the module-level
imports: add "from dynamo.planner.config.planner_config import
PlannerPreDeploymentSweepMode" at the top of the file alongside the other
dynamo.planner imports, remove the inline import inside the block that checks
dgdr.features.planner, and leave the logic that reads
dgdr.features.planner.pre_deployment_sweeping_mode and compares it to
PlannerPreDeploymentSweepMode.Rapid unchanged so the function uses the
module-level symbol.

---

Nitpick comments:
In `@components/src/dynamo/planner/config/parallelization.py`:
- Around line 18-92: The docstrings use the Unicode multiplication sign "×"
which triggers Ruff RUF002; replace each "×" with " * " (or add a
module/file-level "# noqa: RUF002" if you prefer) in the docstring near the
assertion lines, and add a short regression test that constructs a few
PickedParallelConfig instances (DEP: moe_ep>1, TEP: moe_tp>1, dense:
moe_tp=moe_ep=1) and asserts picked_to_aic_model_config_kwargs(p)["tp_size"] ==
p.tp to lock in the intended mapping; also confirm
PickedParallelConfig.model_config = ConfigDict(frozen=True) remains if
downstream sweep/set usage requires hashability.

In `@components/src/dynamo/planner/monitoring/aic_estimator.py`:
- Around line 120-135: The code assigns runtime_config to self.runtime_config
inside estimate_perf but never uses it elsewhere, creating hidden mutable state;
change this by making runtime_config a local variable (e.g., runtime_config =
aiconfigurator.sdk.config.RuntimeConfig(...)) inside the estimate_perf method
instead of self.runtime_config, and update the call to session.run_static to
pass runtime_config (not self.runtime_config). Locate the assignment and the
session.run_static invocation in AIConfiguratorPerfEstimator. Ensure no other
methods reference self.runtime_config and remove the attribute if it's declared
on the class.
- Around line 15-23: The module currently configures the logger at import time
(logger = logging.getLogger(__name__), logger.setLevel(...), console_handler,
formatter, logger.addHandler(...)), which overrides app logging and can cause
duplicate output; remove the calls that set the level and add the StreamHandler
so the library only does logger = logging.getLogger(__name__), and instead
attach a logging.NullHandler() to the logger (e.g.,
logger.addHandler(logging.NullHandler())) if you need to avoid “No handler”
warnings—leave formatter/console_handler and any setLevel calls out of the
module.
- Around line 26-34: The function _try_import_aiconfigurator performs only
submodule imports (aiconfigurator.sdk.*) but returns the bare name
aiconfigurator which may not be explicitly bound in all implementations; update
_try_import_aiconfigurator to first do an explicit import aiconfigurator (before
importing aiconfigurator.sdk.config, aiconfigurator.sdk.models, etc.), then keep
the existing submodule imports and return aiconfigurator so the returned symbol
is unambiguous and guaranteed to be bound in the function scope.

In `@components/src/dynamo/planner/monitoring/aic_interpolation.py`:
- Around line 158-161: Add a one-line comment above the batch_size_per_rank
calculation (the block using max_concurrency_aggregate,
spec.decode_interpolation_granularity and computing batch_size_per_rank = max(1,
num_req_aggregate // attention_dp)) explaining the rounding trade-off: when
num_req_aggregate < attention_dp we clamp to 1 which effectively simulates
attention_dp requests (one per rank) even though the FPM records only
num_req_aggregate, so the regression will see an over‑provisioned shape at the
low end; mention this is an accepted approximation consistent with similar logic
elsewhere.

In `@components/src/dynamo/planner/tests/unit/test_aic_interpolation.py`:
- Around line 216-347: In test_prefill_sweep_shape_and_fpm_convention the tuple
from _patch_estimator is destructured as "ctx_patch, estimator" but estimator is
never used; change the unpacking to "ctx_patch, _ = _patch_estimator(...)" to
silence the unused-variable lint (RUF059) while keeping the mock behavior
intact—locate the call to _patch_estimator inside
test_prefill_sweep_shape_and_fpm_convention and replace the "estimator"
identifier with an underscore.

In `@components/src/dynamo/profiler/profile_sla.py`:
- Around line 385-395: The broad except Exception around calling
get_model_config_from_model_path(resolve_model_path(dgdr)) should be narrowed
and the error logged; replace the blanket except with explicit expected
exceptions (e.g., FileNotFoundError, OSError, KeyError, ValueError) and in the
handler call logger.warning with exc_info=True so the stacktrace is preserved,
then keep the existing fallback assignment to sweep_max_context_length (isl * 2
or 8192) unchanged; ensure references to get_model_config_from_model_path,
resolve_model_path, sweep_max_context_length, and logger.warning are updated
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8134f125-0097-441c-863c-05a7bf3a2c04

📥 Commits

Reviewing files that changed from the base of the PR and between d94b350 and da474a3.

📒 Files selected for processing (16)
  • components/src/dynamo/planner/config/aic_interpolation_spec.py
  • components/src/dynamo/planner/config/parallelization.py
  • components/src/dynamo/planner/config/planner_config.py
  • components/src/dynamo/planner/core/adapters.py
  • components/src/dynamo/planner/monitoring/aic_estimator.py
  • components/src/dynamo/planner/monitoring/aic_interpolation.py
  • components/src/dynamo/planner/monitoring/perf_metrics.py
  • components/src/dynamo/planner/tests/unit/test_advisory_mode.py
  • components/src/dynamo/planner/tests/unit/test_aic_interpolation.py
  • components/src/dynamo/planner/tests/unit/test_perf_metrics_priority.py
  • components/src/dynamo/profiler/profile_sla.py
  • components/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.py
  • components/src/dynamo/profiler/utils/config_modifiers/parallelization_mapping.py
  • components/src/dynamo/profiler/utils/dgd_generation.py
  • components/src/dynamo/profiler/utils/estimate_perf.py
  • components/src/dynamo/profiler/utils/profile_common.py

Comment thread components/src/dynamo/planner/monitoring/aic_interpolation.py
Comment thread components/src/dynamo/planner/monitoring/aic_interpolation.py
Comment thread components/src/dynamo/profiler/utils/profile_common.py
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 7 potential issues.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment thread components/src/dynamo/planner/monitoring/aic_interpolation.py
Comment thread components/src/dynamo/profiler/utils/profile_common.py Outdated
Comment thread components/src/dynamo/planner/tests/unit/test_aic_interpolation.py
Comment thread components/src/dynamo/planner/monitoring/aic_interpolation.py
Comment thread components/src/dynamo/profiler/utils/profile_common.py
…cked DGD

When the profiler emits a DGD with vLLM workers, each worker now gets
DYN_BENCHMARK_MODE set to its role (prefill / decode / agg). vLLM's
startup self-benchmark then populates the get_perf_metrics endpoint,
which the planner consumes at priority 1 of its bootstrap chain —
superseding AIC simulation and NPZ files.

Net effect: a freshly deployed vLLM DGD starts up, runs its own
sweep, and the planner reads live measured FPMs instead of AIC's
approximation. AIC remains the priority-2 fallback when the endpoint
is unreachable; NPZ files remain priority 3.

The worker-name → role map is sourced from VllmComponentName (now
gains an agg_worker_k8s_name = "VllmWorker" attribute) so dgd_generation
stays in sync with the planner/profiler name registry.

Skipped when the mocker backend is active — mocker workers don't
speak DYN_BENCHMARK_MODE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
1. aic_interpolation.py: hoist AIConfiguratorPerfEstimator import to
   module top. The wrapper class itself lazy-imports aiconfigurator
   inside _try_import_aiconfigurator(), so the class-level import is
   cheap and doesn't pull in the optional dependency until instantiation.

2. profile_common.py: hoist PlannerPreDeploymentSweepMode to the module
   imports. The "dep cycle" comment was factually wrong — profile_common
   already transitively imports planner_config.py via dgdr_v1beta1_types.

3. _concurrency_sweep: fix ZeroDivisionError when granularity == 1.
   Previously (max-1)/(gran-1) divided by zero; now returns [max] so
   the single sample hits the top of the range.

4. _sweep_decode: add explicit precondition check mirroring
   _sweep_prefill, so a too-small sweep_max_context_length surfaces a
   clear ValueError instead of the generic "only 0 valid points" error.

5. Add pytest markers (gpu_0 / pre_merge / unit) to the three new test
   files per .ai/pytest-guidelines.md.

6. Add regression test for the granularity=1 guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
The long assert was reformatted by black 23.1.0 — this is the uncommitted
local fixup that made CI pre-commit fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…behavior

The old test_planner_no_mocker_returns_config_with_planner_cm expected
add_profile_data_to_config to be called for the rapid + planner case.
After step 6, needs_profile_data() returns False for planner-only rapid
(planner runs AIC at bootstrap instead), so the profile-data ConfigMap
is no longer emitted — hence the CI failure after rebase.

Split into two tests covering both sweep modes:

* test_rapid_planner_no_mocker_skips_profile_cm — asserts the
  profile-data ConfigMap is NOT emitted and result == [planner_cm,
  dgd_config] for the rapid path.
* test_thorough_planner_no_mocker_returns_config_with_both_cms —
  asserts thorough still emits both ConfigMaps.

Other tests in the class use mocker mode and are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@tedzhouhk tedzhouhk enabled auto-merge (squash) April 20, 2026 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants