feat(planner): own AIC interpolation; fix MoE-DEP bugs in rapid mode by tedzhouhk · Pull Request #8335 · ai-dynamo/dynamo

tedzhouhk · 2026-04-18T00:27:56Z

Summary

Moves AIConfigurator-based interpolation out of the profiler and into the planner for pre_deployment_sweeping_mode: rapid. The profiler now picks configurations and hands them off via a new aic_interpolation block on the planner ConfigMap; the planner runs the AIC sweep in-process at bootstrap and seeds its regression models directly from the synthesised FPMs — no more NPZ round-trip through disk.

This also fixes the 7 silent MoE-DEP correctness issues that made rapid mode crash or quietly mis-model Qwen/Qwen3-235B-A22B-FP8 and other MoE picks using DEP/TEP/hybrid parallelism.

Priority chain for pre-deployment FPMs (highest → lowest):

get_perf_metrics endpoint (live self-benchmark)
AIC interpolation (if aic_interpolation spec is set)
Legacy files under profile_results_dir (thorough mode)

Thorough mode is unchanged — profiler still runs real-GPU interpolation, bakes NPZ into the profile-data ConfigMap, planner reads files. Mocker rapid-mode deployments also unchanged; they keep running the existing AIC-to-NPZ sweep in the profiler.

What changed

New planner modules:

planner/config/parallelization.py — home for PickedParallelConfig (moved from profiler) + picked_to_aic_model_config_kwargs helper
planner/config/aic_interpolation_spec.py — pydantic AICInterpolationSpec for profiler → planner handoff
planner/monitoring/aic_estimator.py — home for AIConfiguratorPerfEstimator (moved from profiler)
planner/monitoring/aic_interpolation.py — run_aic_interpolation() lands the MoE-DEP fixes

Planner wiring:

PlannerConfig.aic_interpolation field
fetch_pre_deployment_metrics gains priority-2 AIC branch with lazy aiconfigurator import
All 4 adapters.py call sites forward aic_spec=self.config.aic_interpolation

Profiler flip:

needs_profile_data() returns False for planner-only rapid (mocker path unaffected)
dgd_generation.py:build_aic_interpolation_spec() populates the spec; rapid+planner-only skips add_profile_data_to_config
profile_sla.py computes sweep_max_context_length once and builds the spec

Back-compat shims for existing profiler importers of PickedParallelConfig and AIConfiguratorPerfEstimator.

MoE-DEP bug fixes (landed in `aic_interpolation.py`)

#	Bug	Fix
1	`moe_tp_size`/`moe_ep_size` missing → `TypeError: None × None`	Every AIC call goes through `picked_to_aic_model_config_kwargs`
2	`PickedParallelConfig.tp_size` (KV-split property) fed to AIC as attention TP	Helper uses raw `p.tp`; property now documented as KV-head-split only
3	`attention_dp_size` never reaches `ModelConfig`	Explicitly passed in both prefill and decode kwargs
4	Decode `RuntimeConfig.batch_size` treated as aggregate	Now passed per-attention-DP-rank (`num_req // dp`)
5	`get_max_kv_tokens` per-rank but downstream math expects aggregate	Multiplied by `attention_dp_size`
6	Decode sweep forced multiples of `attention_dp_size`	Plain linear sweep (DP-routing constraint doesn't apply to AIC's static simulator)
7	FPM conventions drifted between AIC and thorough paths	Both paths now emit identical shapes

Test plan

New unit tests: 26 tests across test_aic_interpolation.py, test_perf_metrics_priority.py, test_dgd_generation_aic.py
- TestPickedToAicKwargs — helper correctness across TP/TEP/DEP/hybrid
- TestRunAicInterpolation — sweeps with mocked estimator; every call asserts MoE kwargs + identity
- TestPriorityChain — 6 tests covering all fallback paths
- TestQwen235MoEPicks — end-to-end with the exact Qwen-235B pick shapes
Full unit suite: 179 passed, 11 skipped locally (skips are pre-existing dynamo.llm / deploy binding absences in the dev venv)
CI: verify the full suite + pre-commit runs green
Manual E2E against Qwen/Qwen3-235B-A22B-FP8 DGDR on h100/h200: confirm profiler skips BuildingCurves phase, DGD has only planner-config ConfigMap, planner logs Loaded N FPMs from AIC interpolation
Manual E2E with pre_deployment_sweeping_mode: thorough: confirm both ConfigMaps still produced and planner reads files
Manual E2E with get_perf_metrics endpoint available: confirm planner short-circuits at priority 1

Migration notes

aiconfigurator is imported lazily; if the planner image ships without it, rapid-mode deployments will surface a clear ImportError and fall through to the file loader. Ops will want to ensure the planner image bundles aiconfigurator for rapid-mode deployments.
Existing DGDRs without aic_interpolation continue to work unchanged (pydantic default is None).
The 10 commits are ordered so CI stays green at every step: types move first (with shims), additions come before wiring, and only the final commit changes observable rapid-mode output.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added in-process performance estimation enabling faster planner bootstrap without relying on external profiling files.
- Extended planner configuration to support rapid-mode metric generation and interpolation.
Tests
- Added comprehensive test coverage for in-process performance estimation and metric retrieval fallback chains.

PickedParallelConfig is a pure data type describing an AIConfigurator parallelism pick (tp, pp, dp, moe_tp, moe_ep). Both the profiler (which picks) and the planner (which will soon consume the pick to bootstrap perf models) need it, so it now lives under dynamo.planner.config. The profiler's parallelization_mapping.py keeps its internal types (ParallelizationMapping, ParallelizationStrategy, and the two helper functions) and re-exports PickedParallelConfig for back-compat. Also documents .tp_size as KV-head-split semantics only (it is NOT the same quantity as AIConfigurator's ModelConfig.tp_size, which is attention TP per rank). The unused to_parallelization_mapping() method is dropped; it had zero callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Follow-up to the previous commit: turn the profiler's parallelization_mapping.py into a shim that re-exports PickedParallelConfig from its new home while keeping the profiler-internal types (ParallelizationMapping, ParallelizationStrategy, get_candidate_parallel_mappings, apply_parallel_mapping_to_config) in place. No behavior change; class identity is preserved (verified: ``PickedParallelConfig is planner.config.parallelization.PickedParallelConfig``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

…ring Relocates the AIC wrapper from the profiler to the planner so the planner can run AIC-based interpolation at bootstrap in rapid mode. The profiler's estimate_perf.py becomes a re-export shim for existing callers. Class identity is preserved — profiler and planner imports point at the same underlying class, so isinstance checks still work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

…g_kwargs Two additions for the upcoming rapid-mode refactor: * picked_to_aic_model_config_kwargs(p) returns the AIC ModelConfig kwargs for a pick (tp_size, pp_size, moe_tp_size, moe_ep_size, attention_dp_size). Uses the raw p.tp / p.dp — NOT the derived p.tp_size property, which has KV-head-split semantics that conflict with AIC's attention-TP definition and break the MoE parallelism identity for DEP picks. * AICInterpolationSpec is the pydantic schema the profiler will embed in the planner ConfigMap in rapid mode, carrying hf_id/system/backend, ISL/OSL/max_context_length, granularities, and the two PickedParallelConfig picks. Serialises cleanly through model_dump_json. PickedParallelConfig is now a frozen pydantic BaseModel (previously a frozen dataclass) to allow clean embedding. No callers rely on dataclass-specific APIs — verified by grep for asdict/is_dataclass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

The planner can now generate ForwardPassMetrics in-process from an AICInterpolationSpec by lazy-importing aiconfigurator and sweeping prefill/decode configurations itself. This replaces the profiler's NPZ-based AIC path for rapid mode and fixes the MoE-DEP correctness bugs that silently corrupted the old sweep: * Every AIC call uses picked_to_aic_model_config_kwargs, so moe_tp_size / moe_ep_size / attention_dp_size always reach ModelConfig. The MoE parallelism identity (tp_size * attention_dp_size == moe_tp * moe_ep) holds by construction. * get_max_kv_tokens is scaled by attention_dp_size to get aggregate capacity, matching thorough-mode semantics downstream. * Decode RuntimeConfig.batch_size is now per-attention-DP-rank (aggregate // dp), not the aggregate — AIC internally scales MoE compute by attention_dp_size, so passing aggregate double-counted. * Decode concurrency sweep is linear; the thorough path's DP-multiples constraint is a round-robin routing requirement that does not apply to AIC's static simulator. FPM conventions mirror the thorough-mode file loader so bootstrap data looks identical regardless of source: * Prefill: (num_prefill_requests=1, sum_prefill_tokens=isl, wall_time=per-rank TTFT) * Decode: (num_decode_requests=aggregate, sum_decode_kv_tokens= aggregate * ctx_len, wall_time=per-step ITL) Also extends the dynamo.common.forward_pass_metrics stub in test_advisory_mode to expose ScheduledRequestMetrics — the partial stub masked the real module when test collection ran in certain orders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

PlannerConfig gains an optional aic_interpolation field and the bootstrap fetcher gains an aic_spec parameter. The priority chain is now: 1. get_perf_metrics endpoint (runtime self-benchmark) 2. AIC interpolation in-process (if aic_spec is set) 3. Legacy profile_results_dir files (thorough-mode fallback) If the endpoint succeeds, AIC and files are never touched. If aiconfigurator is not installed, the ImportError is caught and the chain falls through to files. If AIC raises at runtime, same fallthrough. Only if all three sources fail does the planner raise. All four fetch_pre_deployment_metrics call sites in adapters.py (Prefill, Decode, Agg, Disagg × {prefill, decode}) forward the spec from self.config.aic_interpolation. PlannerConfig validator warns when enable_throughput_scaling + rapid mode + no aic_interpolation — this is the handshake the profiler will satisfy in the final step of the refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

In rapid mode with planner features enabled, the profiler now just picks configurations and emits an AICInterpolationSpec inside the planner's ConfigMap. The planner runs the AIC sweep itself at bootstrap via run_aic_interpolation. No more NPZ round-trip for rapid+planner-only deployments. Changes: * needs_profile_data() returns False for rapid + planner + no mocker. Mocker deployments still need NPZ (they read it directly), so mocker cases keep running the existing AIC sweep in run_interpolation. * build_aic_interpolation_spec() in dgd_generation.py constructs the spec from the DGDR + picks + workload params. Returns None (falling through to the file path) for thorough mode, disabled throughput scaling, missing picks, or unsupported backends. * _build_planner_config / add_planner_to_config / assemble_final_config thread aic_spec through, embedding it in the planner ConfigMap. * profile_sla.py computes sweep_max_context_length once (both rapid and thorough need it) and builds the spec before assembling the final config. Thorough-mode behavior is unchanged: profiler runs real-GPU interpolation, bakes NPZ into the profile-data ConfigMap, planner reads files at priority 3. End-to-end for the Qwen-235B-A22B-FP8 DGDR from the triage report: rapid + planner → profiler skips BuildingCurves phase → DGD has only the planner-config ConfigMap (no planner-profile-data) → planner pod runs AIC in-process → "Loaded N FPMs from AIC interpolation" with correct MoE-DEP semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Adds TestQwen235BugReproCase with the exact two picks from the original triage report: * Prefill: tp=4, dp=1, moe_tp=1, moe_ep=4 * Decode: tp=1, dp=8, moe_tp=2, moe_ep=4 Each test pins down a specific symptom that the old profiler path produced: * test_prefill_pick_satisfies_aic_identity — Bug #1 (missing moe kwargs) and Bug #2 (using .tp_size property) would break the identity for this pick. Verifies all five kwargs and the MoE identity. * test_decode_pick_satisfies_aic_identity — same for the decode pick, which was the exact shape that triggered Bug #3 (missing attention_dp_size). * test_tp_size_property_differs_from_aic_tp_size — direct regression guard: documents that the property returns 1 here while AIC's tp_size is 4. Anyone who wires .tp_size back into AIC kwargs will break this. * test_end_to_end_sweep_delivers_complete_kwargs_to_aic — runs both sweeps and asserts every AIC call gets complete kwargs. This is the direct reproducer of the originally-reported failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Drop the "bug repro" framing. The class now reads as a positive exercise of a realistic large-MoE pick pair that spans the full AIC kwarg surface: a DEP-on-MoE prefill pick and a hybrid TEP+DP decode pick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

black/isort reformatting across the new AIC files; no semantic changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

coderabbitai · 2026-04-18T00:34:05Z

Walkthrough

This PR introduces in-process AIConfigurator (AIC) interpolation support for rapid-mode planner bootstrap. It adds new configuration schemas, an AIC performance estimator wrapper, interpolation sweep logic, and integrates these components into the planner's metrics-fetching fallback chain. It also refactors shared parallelization config to a common location and updates profiler DGD generation to conditionally build and pass AIC specifications.

Changes

Cohort / File(s)	Summary
Configuration Schemas `components/src/dynamo/planner/config/aic_interpolation_spec.py`, `components/src/dynamo/planner/config/parallelization.py`	Introduced `AICInterpolationSpec` to represent the serialized handoff from profiler to planner, and `PickedParallelConfig` (frozen Pydantic model) with computed properties for GPU counts, effective TP sizing, and label formatting. Includes `picked_to_aic_model_config_kwargs()` converter for AIC-compatible kwarg generation.
Core AIC Estimator & Interpolation `components/src/dynamo/planner/monitoring/aic_estimator.py`, `components/src/dynamo/planner/monitoring/aic_interpolation.py`	Added `AIConfiguratorPerfEstimator` wrapper for lazy AIC SDK import, database loading, and perf/KV-cache estimation. Implemented `run_aic_interpolation()` dispatcher that executes component-specific prefill/decode sweeps, interpolating across sequence lengths, estimating metrics per-rank, and emitting aggregate `ForwardPassMetrics`.
Planner Integration `components/src/dynamo/planner/config/planner_config.py`, `components/src/dynamo/planner/core/adapters.py`, `components/src/dynamo/planner/monitoring/perf_metrics.py`	Added optional `aic_interpolation` field to `PlannerConfig`. Updated `fetch_pre_deployment_metrics()` with fallback chain: endpoint → in-process AIC (if spec provided) → legacy file conversion → error. Modified all planner adapters to pass `aic_spec` parameter to metrics-fetching calls.
Profiler DGD Integration `components/src/dynamo/profiler/profile_sla.py`, `components/src/dynamo/profiler/utils/dgd_generation.py`	Updated profiler to compute `sweep_max_context_length` once and conditionally build `AICInterpolationSpec` when disaggregated and not dry-run. Extended `assemble_final_config()`, `add_planner_to_config()`, and `_build_planner_config()` with optional `aic_spec` parameter; introduced `build_aic_interpolation_spec()` factory for conditional spec generation.
Refactoring: Shared Parallelization Config `components/src/dynamo/profiler/utils/config_modifiers/parallelization_mapping.py`, `components/src/dynamo/profiler/utils/estimate_perf.py`	Removed local `PickedParallelConfig` dataclass from `parallelization_mapping.py` and local `AIConfiguratorPerfEstimator` from `estimate_perf.py`; both now import and re-export from planner-owned modules to establish single source of truth.
Config Interpretation Update `components/src/dynamo/profiler/utils/profile_common.py`	Modified `needs_profile_data()` to skip external profiling for rapid-mode planner sweeps, returning `False` when `pre_deployment_sweeping_mode == Rapid` and only planner throughput scaling is enabled.
Test Stub Update `components/src/dynamo/planner/tests/unit/test_advisory_mode.py`	Extended module-level mock stub to include `ScheduledRequestMetrics` symbol alongside `ForwardPassMetrics`.
Comprehensive Unit Tests `components/src/dynamo/planner/tests/unit/test_aic_interpolation.py`, `components/src/dynamo/planner/tests/unit/test_perf_metrics_priority.py`, `components/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.py`	Added test coverage for `picked_to_aic_model_config_kwargs()` correctness, `AICInterpolationSpec` validation, end-to-end `run_aic_interpolation()` behavior with mocked estimator, fallback chain priority/handling in `fetch_pre_deployment_metrics()`, and DGD generation with AIC spec conditional logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: moving AIC interpolation to the planner and fixing MoE-DEP bugs in rapid mode.
Description check	✅ Passed	The description covers all required template sections: Overview (summary of changes), Details (comprehensive breakdown of what changed and bug fixes), and Related Issues section (no GitHub issues referenced, which is acceptable).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (7)

components/src/dynamo/profiler/profile_sla.py (1)
385-395: Consider narrowing the broad except Exception (BLE001).

get_model_config_from_model_path can raise OSError/FileNotFoundError/KeyError/etc.; a blanket except Exception also swallows programming errors (e.g. TypeError) that would be better surfaced. If the set of expected failure modes is stable, list them explicitly; otherwise keep the catch but log the exception (logger.warning(..., exc_info=True)) so diagnosis isn't lost.
🔧 Proposed fix
-        try:
-            model_cfg = get_model_config_from_model_path(resolve_model_path(dgdr))
-            sweep_max_context_length = model_cfg.get("max_position_embeddings", 0)
-        except Exception:
-            logger.warning("Could not fetch model max context length.")
-            sweep_max_context_length = 0
+        try:
+            model_cfg = get_model_config_from_model_path(resolve_model_path(dgdr))
+            sweep_max_context_length = model_cfg.get("max_position_embeddings", 0)
+        except (OSError, KeyError, ValueError) as e:
+            logger.warning("Could not fetch model max context length: %s", e)
+            sweep_max_context_length = 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/profiler/profile_sla.py` around lines 385 - 395, The
broad except Exception around calling
get_model_config_from_model_path(resolve_model_path(dgdr)) should be narrowed
and the error logged; replace the blanket except with explicit expected
exceptions (e.g., FileNotFoundError, OSError, KeyError, ValueError) and in the
handler call logger.warning with exc_info=True so the stacktrace is preserved,
then keep the existing fallback assignment to sweep_max_context_length (isl * 2
or 8192) unchanged; ensure references to get_model_config_from_model_path,
resolve_model_path, sweep_max_context_length, and logger.warning are updated
accordingly.
components/src/dynamo/planner/config/parallelization.py (1)
18-92: LGTM: clean, well-documented shared abstraction.

A few observations, none blocking:

The tp_size property vs AIC's ModelConfig.tp_size semantic conflict is a real foot-gun, and the warning in the docstring plus the dedicated picked_to_aic_model_config_kwargs helper is the right shape to prevent misuse. Consider a short regression test asserting picked_to_aic_model_config_kwargs(p)["tp_size"] == p.tp across a few DEP/TEP/dense picks to lock this in.

Ruff flags the ambiguous multiplication sign × in docstrings (RUF002 at lines 68 and 78). If you want to keep CI quiet, replace with * or add an # noqa: RUF002 on the module or a per-file ruff config entry.

frozen=True makes instances hashable, which is needed if they end up as dict keys/set members during AIC sweeps — worth confirming the downstream sweep code relies on this.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/config/parallelization.py` around lines 18 -
92, The docstrings use the Unicode multiplication sign "×" which triggers Ruff
RUF002; replace each "×" with " * " (or add a module/file-level "# noqa: RUF002"
if you prefer) in the docstring near the assertion lines, and add a short
regression test that constructs a few PickedParallelConfig instances (DEP:
moe_ep>1, TEP: moe_tp>1, dense: moe_tp=moe_ep=1) and asserts
picked_to_aic_model_config_kwargs(p)["tp_size"] == p.tp to lock in the intended
mapping; also confirm PickedParallelConfig.model_config =
ConfigDict(frozen=True) remains if downstream sweep/set usage requires
hashability.
components/src/dynamo/planner/monitoring/aic_estimator.py (3)
120-135: runtime_config should be a local, not instance state.

self.runtime_config is assigned inside estimate_perf but never read outside this method. Storing it as an attribute introduces hidden state that could mislead a future caller into thinking AIConfiguratorPerfEstimator is a stateful/session object, and it gets overwritten on every call.
♻️ Proposed fix
-        self.runtime_config = aiconfigurator.sdk.config.RuntimeConfig(
+        runtime_config = aiconfigurator.sdk.config.RuntimeConfig(
             batch_size=batch_size,
             beam_width=1,
             isl=isl,
             osl=osl,
         )
@@
         summary = session.run_static(
-            mode=mode_to_aic_mode[mode], runtime_config=self.runtime_config, stride=32
+            mode=mode_to_aic_mode[mode], runtime_config=runtime_config, stride=32
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/monitoring/aic_estimator.py` around lines 120 -
135, The code assigns runtime_config to self.runtime_config inside estimate_perf
but never uses it elsewhere, creating hidden mutable state; change this by
making runtime_config a local variable (e.g., runtime_config =
aiconfigurator.sdk.config.RuntimeConfig(...)) inside the estimate_perf method
instead of self.runtime_config, and update the call to session.run_static to
pass runtime_config (not self.runtime_config). Locate the assignment and the
session.run_static invocation in AIConfiguratorPerfEstimator. Ensure no other
methods reference self.runtime_config and remove the attribute if it's declared
on the class.
15-23: Library modules shouldn't mutate their own logger level or attach handlers.

Calling logger.setLevel(INFO) and logger.addHandler(StreamHandler(...)) at module import time (a) overrides whatever log config the hosting app has chosen and (b) causes duplicate output when the root logger also has handlers. Convention is to let the application configure logging; the library just does logger = logging.getLogger(__name__).
♻️ Proposed fix
 logger = logging.getLogger(__name__)
-logger.setLevel(logging.INFO)
-console_handler = logging.StreamHandler()
-console_handler.setLevel(logging.INFO)
-formatter = logging.Formatter(
-    "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S"
-)
-console_handler.setFormatter(formatter)
-logger.addHandler(console_handler)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/monitoring/aic_estimator.py` around lines 15 -
23, The module currently configures the logger at import time (logger =
logging.getLogger(__name__), logger.setLevel(...), console_handler, formatter,
logger.addHandler(...)), which overrides app logging and can cause duplicate
output; remove the calls that set the level and add the StreamHandler so the
library only does logger = logging.getLogger(__name__), and instead attach a
logging.NullHandler() to the logger (e.g.,
logger.addHandler(logging.NullHandler())) if you need to avoid “No handler”
warnings—leave formatter/console_handler and any setLevel calls out of the
module.
26-34: Verify return aiconfigurator resolves after submodule-only imports.

The function does import aiconfigurator.sdk.<x> five times but never an explicit import aiconfigurator, then returns the bare aiconfigurator name. This works in CPython today because import a.b.c binds a in the local namespace — but the intent is non-obvious. Consider adding import aiconfigurator as the first line (cheap, already cached) to make the binding explicit.
Does `import aiconfigurator.sdk.config` bind the top-level `aiconfigurator` name in the local scope of a function in Python 3?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/monitoring/aic_estimator.py` around lines 26 -
34, The function _try_import_aiconfigurator performs only submodule imports
(aiconfigurator.sdk.*) but returns the bare name aiconfigurator which may not be
explicitly bound in all implementations; update _try_import_aiconfigurator to
first do an explicit import aiconfigurator (before importing
aiconfigurator.sdk.config, aiconfigurator.sdk.models, etc.), then keep the
existing submodule imports and return aiconfigurator so the returned symbol is
unambiguous and guaranteed to be bound in the function scope.
components/src/dynamo/planner/monitoring/aic_interpolation.py (1)
158-161: Document the aggregate→per-rank rounding trade-off.

When num_req_aggregate < attention_dp, batch_size_per_rank clamps to 1, which makes AIC simulate attention_dp requests (one per rank) instead of the num_req_aggregate the FPM records. The FPM still carries the aggregate count, so the regression will learn ITL from an over-provisioned shape at the low end.

Not worth blocking on — the thorough path had a similar approximation — but a one-line comment here saves the next reader the archaeology.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/monitoring/aic_interpolation.py` around lines
158 - 161, Add a one-line comment above the batch_size_per_rank calculation (the
block using max_concurrency_aggregate, spec.decode_interpolation_granularity and
computing batch_size_per_rank = max(1, num_req_aggregate // attention_dp))
explaining the rounding trade-off: when num_req_aggregate < attention_dp we
clamp to 1 which effectively simulates attention_dp requests (one per rank) even
though the FPM records only num_req_aggregate, so the regression will see an
over‑provisioned shape at the low end; mention this is an accepted approximation
consistent with similar logic elsewhere.
components/src/dynamo/planner/tests/unit/test_aic_interpolation.py (1)
216-347: Strong regression coverage for the MoE-DEP fixes.

The identity assertion tp_size * attention_dp_size == moe_tp_size * moe_ep_size baked into _assert_identity plus the explicit "don't derive from p.tp_size property" regression guard (line 146–155) pins the exact invariants the old profiler path violated. Good defensive testing.

Minor nit: line 225 unpacks estimator but never uses it (_patch_estimator(...) already exposes the mock via the second return). Renaming to _ keeps Ruff (RUF059) quiet:
-        ctx_patch, estimator = _patch_estimator(prefill_latency_ms=123.0)
+        ctx_patch, _estimator = _patch_estimator(prefill_latency_ms=123.0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/planner/tests/unit/test_aic_interpolation.py` around
lines 216 - 347, In test_prefill_sweep_shape_and_fpm_convention the tuple from
_patch_estimator is destructured as "ctx_patch, estimator" but estimator is
never used; change the unpacking to "ctx_patch, _ = _patch_estimator(...)" to
silence the unused-variable lint (RUF059) while keeping the mock behavior
intact—locate the call to _patch_estimator inside
test_prefill_sweep_shape_and_fpm_convention and replace the "estimator"
identifier with an underscore.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/planner/monitoring/aic_interpolation.py`:
- Around line 138-155: Add an explicit precondition check at the start of
_sweep_decode to mirror _sweep_prefill: validate that
spec.sweep_max_context_length - osl_sweep > 100 (or whatever minimum window you
use for the decode range) and raise a clear error (ValueError or similar) with a
descriptive message referencing spec.sweep_max_context_length and osl_sweep when
the condition fails; this ensures the for-range over isl (which uses 100 as the
start) cannot silently produce zero iterations and yields a direct, actionable
error instead of the generic "produced only 0 valid points" message.
- Line 67: Move the AIConfiguratorPerfEstimator import out of the function and
place it at the top of the module: remove the function-level import of
AIConfiguratorPerfEstimator in aic_interpolation.py and add a module-level
import "from dynamo.planner.monitoring.aic_estimator import
AIConfiguratorPerfEstimator" near the other imports so the class is imported at
module load (the estimator import is cheap and the optional dependency only
fails on instantiation); ensure you delete the TYPE_CHECKING block or the
in-function import that references AIConfiguratorPerfEstimator so there are no
duplicate or nested imports.

In `@components/src/dynamo/profiler/utils/profile_common.py`:
- Around line 200-210: Hoist the function-local import into the module-level
imports: add "from dynamo.planner.config.planner_config import
PlannerPreDeploymentSweepMode" at the top of the file alongside the other
dynamo.planner imports, remove the inline import inside the block that checks
dgdr.features.planner, and leave the logic that reads
dgdr.features.planner.pre_deployment_sweeping_mode and compares it to
PlannerPreDeploymentSweepMode.Rapid unchanged so the function uses the
module-level symbol.

---

Nitpick comments:
In `@components/src/dynamo/planner/config/parallelization.py`:
- Around line 18-92: The docstrings use the Unicode multiplication sign "×"
which triggers Ruff RUF002; replace each "×" with " * " (or add a
module/file-level "# noqa: RUF002" if you prefer) in the docstring near the
assertion lines, and add a short regression test that constructs a few
PickedParallelConfig instances (DEP: moe_ep>1, TEP: moe_tp>1, dense:
moe_tp=moe_ep=1) and asserts picked_to_aic_model_config_kwargs(p)["tp_size"] ==
p.tp to lock in the intended mapping; also confirm
PickedParallelConfig.model_config = ConfigDict(frozen=True) remains if
downstream sweep/set usage requires hashability.

In `@components/src/dynamo/planner/monitoring/aic_estimator.py`:
- Around line 120-135: The code assigns runtime_config to self.runtime_config
inside estimate_perf but never uses it elsewhere, creating hidden mutable state;
change this by making runtime_config a local variable (e.g., runtime_config =
aiconfigurator.sdk.config.RuntimeConfig(...)) inside the estimate_perf method
instead of self.runtime_config, and update the call to session.run_static to
pass runtime_config (not self.runtime_config). Locate the assignment and the
session.run_static invocation in AIConfiguratorPerfEstimator. Ensure no other
methods reference self.runtime_config and remove the attribute if it's declared
on the class.
- Around line 15-23: The module currently configures the logger at import time
(logger = logging.getLogger(__name__), logger.setLevel(...), console_handler,
formatter, logger.addHandler(...)), which overrides app logging and can cause
duplicate output; remove the calls that set the level and add the StreamHandler
so the library only does logger = logging.getLogger(__name__), and instead
attach a logging.NullHandler() to the logger (e.g.,
logger.addHandler(logging.NullHandler())) if you need to avoid “No handler”
warnings—leave formatter/console_handler and any setLevel calls out of the
module.
- Around line 26-34: The function _try_import_aiconfigurator performs only
submodule imports (aiconfigurator.sdk.*) but returns the bare name
aiconfigurator which may not be explicitly bound in all implementations; update
_try_import_aiconfigurator to first do an explicit import aiconfigurator (before
importing aiconfigurator.sdk.config, aiconfigurator.sdk.models, etc.), then keep
the existing submodule imports and return aiconfigurator so the returned symbol
is unambiguous and guaranteed to be bound in the function scope.

In `@components/src/dynamo/planner/monitoring/aic_interpolation.py`:
- Around line 158-161: Add a one-line comment above the batch_size_per_rank
calculation (the block using max_concurrency_aggregate,
spec.decode_interpolation_granularity and computing batch_size_per_rank = max(1,
num_req_aggregate // attention_dp)) explaining the rounding trade-off: when
num_req_aggregate < attention_dp we clamp to 1 which effectively simulates
attention_dp requests (one per rank) even though the FPM records only
num_req_aggregate, so the regression will see an over‑provisioned shape at the
low end; mention this is an accepted approximation consistent with similar logic
elsewhere.

In `@components/src/dynamo/planner/tests/unit/test_aic_interpolation.py`:
- Around line 216-347: In test_prefill_sweep_shape_and_fpm_convention the tuple
from _patch_estimator is destructured as "ctx_patch, estimator" but estimator is
never used; change the unpacking to "ctx_patch, _ = _patch_estimator(...)" to
silence the unused-variable lint (RUF059) while keeping the mock behavior
intact—locate the call to _patch_estimator inside
test_prefill_sweep_shape_and_fpm_convention and replace the "estimator"
identifier with an underscore.

In `@components/src/dynamo/profiler/profile_sla.py`:
- Around line 385-395: The broad except Exception around calling
get_model_config_from_model_path(resolve_model_path(dgdr)) should be narrowed
and the error logged; replace the blanket except with explicit expected
exceptions (e.g., FileNotFoundError, OSError, KeyError, ValueError) and in the
handler call logger.warning with exc_info=True so the stacktrace is preserved,
then keep the existing fallback assignment to sweep_max_context_length (isl * 2
or 8192) unchanged; ensure references to get_model_config_from_model_path,
resolve_model_path, sweep_max_context_length, and logger.warning are updated
accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8134f125-0097-441c-863c-05a7bf3a2c04

📥 Commits

Reviewing files that changed from the base of the PR and between d94b350 and da474a3.

📒 Files selected for processing (16)

components/src/dynamo/planner/config/aic_interpolation_spec.py
components/src/dynamo/planner/config/parallelization.py
components/src/dynamo/planner/config/planner_config.py
components/src/dynamo/planner/core/adapters.py
components/src/dynamo/planner/monitoring/aic_estimator.py
components/src/dynamo/planner/monitoring/aic_interpolation.py
components/src/dynamo/planner/monitoring/perf_metrics.py
components/src/dynamo/planner/tests/unit/test_advisory_mode.py
components/src/dynamo/planner/tests/unit/test_aic_interpolation.py
components/src/dynamo/planner/tests/unit/test_perf_metrics_priority.py
components/src/dynamo/profiler/profile_sla.py
components/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.py
components/src/dynamo/profiler/utils/config_modifiers/parallelization_mapping.py
components/src/dynamo/profiler/utils/dgd_generation.py
components/src/dynamo/profiler/utils/estimate_perf.py
components/src/dynamo/profiler/utils/profile_common.py

devin-ai-integration

Devin Review found 7 potential issues.

View 3 additional findings in Devin Review.

…cked DGD When the profiler emits a DGD with vLLM workers, each worker now gets DYN_BENCHMARK_MODE set to its role (prefill / decode / agg). vLLM's startup self-benchmark then populates the get_perf_metrics endpoint, which the planner consumes at priority 1 of its bootstrap chain — superseding AIC simulation and NPZ files. Net effect: a freshly deployed vLLM DGD starts up, runs its own sweep, and the planner reads live measured FPMs instead of AIC's approximation. AIC remains the priority-2 fallback when the endpoint is unreachable; NPZ files remain priority 3. The worker-name → role map is sourced from VllmComponentName (now gains an agg_worker_k8s_name = "VllmWorker" attribute) so dgd_generation stays in sync with the planner/profiler name registry. Skipped when the mocker backend is active — mocker workers don't speak DYN_BENCHMARK_MODE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

1. aic_interpolation.py: hoist AIConfiguratorPerfEstimator import to module top. The wrapper class itself lazy-imports aiconfigurator inside _try_import_aiconfigurator(), so the class-level import is cheap and doesn't pull in the optional dependency until instantiation. 2. profile_common.py: hoist PlannerPreDeploymentSweepMode to the module imports. The "dep cycle" comment was factually wrong — profile_common already transitively imports planner_config.py via dgdr_v1beta1_types. 3. _concurrency_sweep: fix ZeroDivisionError when granularity == 1. Previously (max-1)/(gran-1) divided by zero; now returns [max] so the single sample hits the top of the range. 4. _sweep_decode: add explicit precondition check mirroring _sweep_prefill, so a too-small sweep_max_context_length surfaces a clear ValueError instead of the generic "only 0 valid points" error. 5. Add pytest markers (gpu_0 / pre_merge / unit) to the three new test files per .ai/pytest-guidelines.md. 6. Add regression test for the granularity=1 guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

The long assert was reformatted by black 23.1.0 — this is the uncommitted local fixup that made CI pre-commit fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

…behavior The old test_planner_no_mocker_returns_config_with_planner_cm expected add_profile_data_to_config to be called for the rapid + planner case. After step 6, needs_profile_data() returns False for planner-only rapid (planner runs AIC at bootstrap instead), so the profile-data ConfigMap is no longer emitted — hence the CI failure after rebase. Split into two tests covering both sweep modes: * test_rapid_planner_no_mocker_skips_profile_cm — asserts the profile-data ConfigMap is NOT emitted and result == [planner_cm, dgd_config] for the rapid path. * test_thorough_planner_no_mocker_returns_config_with_both_cms — asserts thorough still emits both ConfigMaps. Other tests in the class use mocker mode and are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

tedzhouhk and others added 10 commits April 17, 2026 16:53

style: apply pre-commit formatting fixups

da474a3

black/isort reformatting across the new AIC files; no semantic changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

tedzhouhk requested review from a team as code owners April 18, 2026 00:27

pull-request-size Bot added the size/XXL label Apr 18, 2026

github-actions Bot added feat planner labels Apr 18, 2026

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread components/src/dynamo/planner/monitoring/aic_interpolation.py

Comment thread components/src/dynamo/planner/monitoring/aic_interpolation.py

Comment thread components/src/dynamo/profiler/utils/profile_common.py

devin-ai-integration Bot reviewed Apr 18, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 16:27 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 16:32 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 16:33 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 16:45 Inactive

copy-pr-bot Bot had a problem deploying to GITLAB April 20, 2026 17:09 Failure

PeaBrane approved these changes Apr 20, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 17:27 Inactive

tedzhouhk enabled auto-merge (squash) April 20, 2026 17:28

tedzhouhk merged commit 2618812 into main Apr 20, 2026
65 checks passed

tedzhouhk deleted the hzhou/planner-owns-aic-interp branch April 20, 2026 18:00

copy-pr-bot Bot had a problem deploying to GITLAB April 20, 2026 18:51 Failure

This was referenced Apr 21, 2026

feat(profiler): wire mocker-rapid to direct AIC flags, drop profiler AIC interp #8455

Merged

docs(profiler): clarify when planner-profile-data ConfigMap is emitted [DYN-2751] (cherry-pick of #8486) #8516

Merged

hhzhang16 mentioned this pull request May 4, 2026

[BUG][GROUP]: fix DGDR profiling and deployment errors on existing models and backends #8469

Open

15 tasks

Jont828 mentioned this pull request May 5, 2026

MoE profiling crashes during interpolation: missing moe_tp_size/moe_ep_size in AIC calls #8273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(planner): own AIC interpolation; fix MoE-DEP bugs in rapid mode#8335

feat(planner): own AIC interpolation; fix MoE-DEP bugs in rapid mode#8335
tedzhouhk merged 14 commits into
mainfrom
hzhou/planner-owns-aic-interp

tedzhouhk commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 18, 2026

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tedzhouhk commented Apr 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

MoE-DEP bug fixes (landed in aic_interpolation.py)

Test plan

Migration notes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 18, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tedzhouhk commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

MoE-DEP bug fixes (landed in `aic_interpolation.py`)