feat(planner): own AIC interpolation; fix MoE-DEP bugs in rapid mode#8335
Conversation
PickedParallelConfig is a pure data type describing an AIConfigurator parallelism pick (tp, pp, dp, moe_tp, moe_ep). Both the profiler (which picks) and the planner (which will soon consume the pick to bootstrap perf models) need it, so it now lives under dynamo.planner.config. The profiler's parallelization_mapping.py keeps its internal types (ParallelizationMapping, ParallelizationStrategy, and the two helper functions) and re-exports PickedParallelConfig for back-compat. Also documents .tp_size as KV-head-split semantics only (it is NOT the same quantity as AIConfigurator's ModelConfig.tp_size, which is attention TP per rank). The unused to_parallelization_mapping() method is dropped; it had zero callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Follow-up to the previous commit: turn the profiler's parallelization_mapping.py into a shim that re-exports PickedParallelConfig from its new home while keeping the profiler-internal types (ParallelizationMapping, ParallelizationStrategy, get_candidate_parallel_mappings, apply_parallel_mapping_to_config) in place. No behavior change; class identity is preserved (verified: ``PickedParallelConfig is planner.config.parallelization.PickedParallelConfig``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…ring Relocates the AIC wrapper from the profiler to the planner so the planner can run AIC-based interpolation at bootstrap in rapid mode. The profiler's estimate_perf.py becomes a re-export shim for existing callers. Class identity is preserved — profiler and planner imports point at the same underlying class, so isinstance checks still work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…g_kwargs Two additions for the upcoming rapid-mode refactor: * picked_to_aic_model_config_kwargs(p) returns the AIC ModelConfig kwargs for a pick (tp_size, pp_size, moe_tp_size, moe_ep_size, attention_dp_size). Uses the raw p.tp / p.dp — NOT the derived p.tp_size property, which has KV-head-split semantics that conflict with AIC's attention-TP definition and break the MoE parallelism identity for DEP picks. * AICInterpolationSpec is the pydantic schema the profiler will embed in the planner ConfigMap in rapid mode, carrying hf_id/system/backend, ISL/OSL/max_context_length, granularities, and the two PickedParallelConfig picks. Serialises cleanly through model_dump_json. PickedParallelConfig is now a frozen pydantic BaseModel (previously a frozen dataclass) to allow clean embedding. No callers rely on dataclass-specific APIs — verified by grep for asdict/is_dataclass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
The planner can now generate ForwardPassMetrics in-process from an AICInterpolationSpec by lazy-importing aiconfigurator and sweeping prefill/decode configurations itself. This replaces the profiler's NPZ-based AIC path for rapid mode and fixes the MoE-DEP correctness bugs that silently corrupted the old sweep: * Every AIC call uses picked_to_aic_model_config_kwargs, so moe_tp_size / moe_ep_size / attention_dp_size always reach ModelConfig. The MoE parallelism identity (tp_size * attention_dp_size == moe_tp * moe_ep) holds by construction. * get_max_kv_tokens is scaled by attention_dp_size to get aggregate capacity, matching thorough-mode semantics downstream. * Decode RuntimeConfig.batch_size is now per-attention-DP-rank (aggregate // dp), not the aggregate — AIC internally scales MoE compute by attention_dp_size, so passing aggregate double-counted. * Decode concurrency sweep is linear; the thorough path's DP-multiples constraint is a round-robin routing requirement that does not apply to AIC's static simulator. FPM conventions mirror the thorough-mode file loader so bootstrap data looks identical regardless of source: * Prefill: (num_prefill_requests=1, sum_prefill_tokens=isl, wall_time=per-rank TTFT) * Decode: (num_decode_requests=aggregate, sum_decode_kv_tokens= aggregate * ctx_len, wall_time=per-step ITL) Also extends the dynamo.common.forward_pass_metrics stub in test_advisory_mode to expose ScheduledRequestMetrics — the partial stub masked the real module when test collection ran in certain orders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
PlannerConfig gains an optional aic_interpolation field and the
bootstrap fetcher gains an aic_spec parameter. The priority chain is
now:
1. get_perf_metrics endpoint (runtime self-benchmark)
2. AIC interpolation in-process (if aic_spec is set)
3. Legacy profile_results_dir files (thorough-mode fallback)
If the endpoint succeeds, AIC and files are never touched. If
aiconfigurator is not installed, the ImportError is caught and the
chain falls through to files. If AIC raises at runtime, same
fallthrough. Only if all three sources fail does the planner raise.
All four fetch_pre_deployment_metrics call sites in adapters.py
(Prefill, Decode, Agg, Disagg × {prefill, decode}) forward the spec
from self.config.aic_interpolation.
PlannerConfig validator warns when enable_throughput_scaling + rapid
mode + no aic_interpolation — this is the handshake the profiler will
satisfy in the final step of the refactor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
In rapid mode with planner features enabled, the profiler now just picks configurations and emits an AICInterpolationSpec inside the planner's ConfigMap. The planner runs the AIC sweep itself at bootstrap via run_aic_interpolation. No more NPZ round-trip for rapid+planner-only deployments. Changes: * needs_profile_data() returns False for rapid + planner + no mocker. Mocker deployments still need NPZ (they read it directly), so mocker cases keep running the existing AIC sweep in run_interpolation. * build_aic_interpolation_spec() in dgd_generation.py constructs the spec from the DGDR + picks + workload params. Returns None (falling through to the file path) for thorough mode, disabled throughput scaling, missing picks, or unsupported backends. * _build_planner_config / add_planner_to_config / assemble_final_config thread aic_spec through, embedding it in the planner ConfigMap. * profile_sla.py computes sweep_max_context_length once (both rapid and thorough need it) and builds the spec before assembling the final config. Thorough-mode behavior is unchanged: profiler runs real-GPU interpolation, bakes NPZ into the profile-data ConfigMap, planner reads files at priority 3. End-to-end for the Qwen-235B-A22B-FP8 DGDR from the triage report: rapid + planner → profiler skips BuildingCurves phase → DGD has only the planner-config ConfigMap (no planner-profile-data) → planner pod runs AIC in-process → "Loaded N FPMs from AIC interpolation" with correct MoE-DEP semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Adds TestQwen235BugReproCase with the exact two picks from the original triage report: * Prefill: tp=4, dp=1, moe_tp=1, moe_ep=4 * Decode: tp=1, dp=8, moe_tp=2, moe_ep=4 Each test pins down a specific symptom that the old profiler path produced: * test_prefill_pick_satisfies_aic_identity — Bug #1 (missing moe kwargs) and Bug #2 (using .tp_size property) would break the identity for this pick. Verifies all five kwargs and the MoE identity. * test_decode_pick_satisfies_aic_identity — same for the decode pick, which was the exact shape that triggered Bug #3 (missing attention_dp_size). * test_tp_size_property_differs_from_aic_tp_size — direct regression guard: documents that the property returns 1 here while AIC's tp_size is 4. Anyone who wires .tp_size back into AIC kwargs will break this. * test_end_to_end_sweep_delivers_complete_kwargs_to_aic — runs both sweeps and asserts every AIC call gets complete kwargs. This is the direct reproducer of the originally-reported failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Drop the "bug repro" framing. The class now reads as a positive exercise of a realistic large-MoE pick pair that spans the full AIC kwarg surface: a DEP-on-MoE prefill pick and a hybrid TEP+DP decode pick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
black/isort reformatting across the new AIC files; no semantic changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
WalkthroughThis PR introduces in-process AIConfigurator (AIC) interpolation support for rapid-mode planner bootstrap. It adds new configuration schemas, an AIC performance estimator wrapper, interpolation sweep logic, and integrates these components into the planner's metrics-fetching fallback chain. It also refactors shared parallelization config to a common location and updates profiler DGD generation to conditionally build and pass AIC specifications. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (7)
components/src/dynamo/profiler/profile_sla.py (1)
385-395: Consider narrowing the broadexcept Exception(BLE001).
get_model_config_from_model_pathcan raiseOSError/FileNotFoundError/KeyError/etc.; a blanketexcept Exceptionalso swallows programming errors (e.g.TypeError) that would be better surfaced. If the set of expected failure modes is stable, list them explicitly; otherwise keep the catch but log the exception (logger.warning(..., exc_info=True)) so diagnosis isn't lost.🔧 Proposed fix
- try: - model_cfg = get_model_config_from_model_path(resolve_model_path(dgdr)) - sweep_max_context_length = model_cfg.get("max_position_embeddings", 0) - except Exception: - logger.warning("Could not fetch model max context length.") - sweep_max_context_length = 0 + try: + model_cfg = get_model_config_from_model_path(resolve_model_path(dgdr)) + sweep_max_context_length = model_cfg.get("max_position_embeddings", 0) + except (OSError, KeyError, ValueError) as e: + logger.warning("Could not fetch model max context length: %s", e) + sweep_max_context_length = 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/profiler/profile_sla.py` around lines 385 - 395, The broad except Exception around calling get_model_config_from_model_path(resolve_model_path(dgdr)) should be narrowed and the error logged; replace the blanket except with explicit expected exceptions (e.g., FileNotFoundError, OSError, KeyError, ValueError) and in the handler call logger.warning with exc_info=True so the stacktrace is preserved, then keep the existing fallback assignment to sweep_max_context_length (isl * 2 or 8192) unchanged; ensure references to get_model_config_from_model_path, resolve_model_path, sweep_max_context_length, and logger.warning are updated accordingly.components/src/dynamo/planner/config/parallelization.py (1)
18-92: LGTM: clean, well-documented shared abstraction.A few observations, none blocking:
- The
tp_sizeproperty vs AIC'sModelConfig.tp_sizesemantic conflict is a real foot-gun, and the warning in the docstring plus the dedicatedpicked_to_aic_model_config_kwargshelper is the right shape to prevent misuse. Consider a short regression test assertingpicked_to_aic_model_config_kwargs(p)["tp_size"] == p.tpacross a few DEP/TEP/dense picks to lock this in.- Ruff flags the ambiguous multiplication sign
×in docstrings (RUF002 at lines 68 and 78). If you want to keep CI quiet, replace with*or add an# noqa: RUF002on the module or a per-file ruff config entry.frozen=Truemakes instances hashable, which is needed if they end up as dict keys/set members during AIC sweeps — worth confirming the downstream sweep code relies on this.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/planner/config/parallelization.py` around lines 18 - 92, The docstrings use the Unicode multiplication sign "×" which triggers Ruff RUF002; replace each "×" with " * " (or add a module/file-level "# noqa: RUF002" if you prefer) in the docstring near the assertion lines, and add a short regression test that constructs a few PickedParallelConfig instances (DEP: moe_ep>1, TEP: moe_tp>1, dense: moe_tp=moe_ep=1) and asserts picked_to_aic_model_config_kwargs(p)["tp_size"] == p.tp to lock in the intended mapping; also confirm PickedParallelConfig.model_config = ConfigDict(frozen=True) remains if downstream sweep/set usage requires hashability.components/src/dynamo/planner/monitoring/aic_estimator.py (3)
120-135:runtime_configshould be a local, not instance state.
self.runtime_configis assigned insideestimate_perfbut never read outside this method. Storing it as an attribute introduces hidden state that could mislead a future caller into thinkingAIConfiguratorPerfEstimatoris a stateful/session object, and it gets overwritten on every call.♻️ Proposed fix
- self.runtime_config = aiconfigurator.sdk.config.RuntimeConfig( + runtime_config = aiconfigurator.sdk.config.RuntimeConfig( batch_size=batch_size, beam_width=1, isl=isl, osl=osl, ) @@ summary = session.run_static( - mode=mode_to_aic_mode[mode], runtime_config=self.runtime_config, stride=32 + mode=mode_to_aic_mode[mode], runtime_config=runtime_config, stride=32 )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/planner/monitoring/aic_estimator.py` around lines 120 - 135, The code assigns runtime_config to self.runtime_config inside estimate_perf but never uses it elsewhere, creating hidden mutable state; change this by making runtime_config a local variable (e.g., runtime_config = aiconfigurator.sdk.config.RuntimeConfig(...)) inside the estimate_perf method instead of self.runtime_config, and update the call to session.run_static to pass runtime_config (not self.runtime_config). Locate the assignment and the session.run_static invocation in AIConfiguratorPerfEstimator. Ensure no other methods reference self.runtime_config and remove the attribute if it's declared on the class.
15-23: Library modules shouldn't mutate their own logger level or attach handlers.Calling
logger.setLevel(INFO)andlogger.addHandler(StreamHandler(...))at module import time (a) overrides whatever log config the hosting app has chosen and (b) causes duplicate output when the root logger also has handlers. Convention is to let the application configure logging; the library just doeslogger = logging.getLogger(__name__).♻️ Proposed fix
logger = logging.getLogger(__name__) -logger.setLevel(logging.INFO) -console_handler = logging.StreamHandler() -console_handler.setLevel(logging.INFO) -formatter = logging.Formatter( - "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S" -) -console_handler.setFormatter(formatter) -logger.addHandler(console_handler)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/planner/monitoring/aic_estimator.py` around lines 15 - 23, The module currently configures the logger at import time (logger = logging.getLogger(__name__), logger.setLevel(...), console_handler, formatter, logger.addHandler(...)), which overrides app logging and can cause duplicate output; remove the calls that set the level and add the StreamHandler so the library only does logger = logging.getLogger(__name__), and instead attach a logging.NullHandler() to the logger (e.g., logger.addHandler(logging.NullHandler())) if you need to avoid “No handler” warnings—leave formatter/console_handler and any setLevel calls out of the module.
26-34: Verifyreturn aiconfiguratorresolves after submodule-only imports.The function does
import aiconfigurator.sdk.<x>five times but never an explicitimport aiconfigurator, then returns the bareaiconfiguratorname. This works in CPython today becauseimport a.b.cbindsain the local namespace — but the intent is non-obvious. Consider addingimport aiconfiguratoras the first line (cheap, already cached) to make the binding explicit.Does `import aiconfigurator.sdk.config` bind the top-level `aiconfigurator` name in the local scope of a function in Python 3?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/planner/monitoring/aic_estimator.py` around lines 26 - 34, The function _try_import_aiconfigurator performs only submodule imports (aiconfigurator.sdk.*) but returns the bare name aiconfigurator which may not be explicitly bound in all implementations; update _try_import_aiconfigurator to first do an explicit import aiconfigurator (before importing aiconfigurator.sdk.config, aiconfigurator.sdk.models, etc.), then keep the existing submodule imports and return aiconfigurator so the returned symbol is unambiguous and guaranteed to be bound in the function scope.components/src/dynamo/planner/monitoring/aic_interpolation.py (1)
158-161: Document the aggregate→per-rank rounding trade-off.When
num_req_aggregate < attention_dp,batch_size_per_rankclamps to 1, which makes AIC simulateattention_dprequests (one per rank) instead of thenum_req_aggregatethe FPM records. The FPM still carries the aggregate count, so the regression will learn ITL from an over-provisioned shape at the low end.Not worth blocking on — the thorough path had a similar approximation — but a one-line comment here saves the next reader the archaeology.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/planner/monitoring/aic_interpolation.py` around lines 158 - 161, Add a one-line comment above the batch_size_per_rank calculation (the block using max_concurrency_aggregate, spec.decode_interpolation_granularity and computing batch_size_per_rank = max(1, num_req_aggregate // attention_dp)) explaining the rounding trade-off: when num_req_aggregate < attention_dp we clamp to 1 which effectively simulates attention_dp requests (one per rank) even though the FPM records only num_req_aggregate, so the regression will see an over‑provisioned shape at the low end; mention this is an accepted approximation consistent with similar logic elsewhere.components/src/dynamo/planner/tests/unit/test_aic_interpolation.py (1)
216-347: Strong regression coverage for the MoE-DEP fixes.The identity assertion
tp_size * attention_dp_size == moe_tp_size * moe_ep_sizebaked into_assert_identityplus the explicit "don't derive fromp.tp_sizeproperty" regression guard (line 146–155) pins the exact invariants the old profiler path violated. Good defensive testing.Minor nit: line 225 unpacks
estimatorbut never uses it (_patch_estimator(...)already exposes the mock via the second return). Renaming to_keeps Ruff (RUF059) quiet:- ctx_patch, estimator = _patch_estimator(prefill_latency_ms=123.0) + ctx_patch, _estimator = _patch_estimator(prefill_latency_ms=123.0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/planner/tests/unit/test_aic_interpolation.py` around lines 216 - 347, In test_prefill_sweep_shape_and_fpm_convention the tuple from _patch_estimator is destructured as "ctx_patch, estimator" but estimator is never used; change the unpacking to "ctx_patch, _ = _patch_estimator(...)" to silence the unused-variable lint (RUF059) while keeping the mock behavior intact—locate the call to _patch_estimator inside test_prefill_sweep_shape_and_fpm_convention and replace the "estimator" identifier with an underscore.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/src/dynamo/planner/monitoring/aic_interpolation.py`:
- Around line 138-155: Add an explicit precondition check at the start of
_sweep_decode to mirror _sweep_prefill: validate that
spec.sweep_max_context_length - osl_sweep > 100 (or whatever minimum window you
use for the decode range) and raise a clear error (ValueError or similar) with a
descriptive message referencing spec.sweep_max_context_length and osl_sweep when
the condition fails; this ensures the for-range over isl (which uses 100 as the
start) cannot silently produce zero iterations and yields a direct, actionable
error instead of the generic "produced only 0 valid points" message.
- Line 67: Move the AIConfiguratorPerfEstimator import out of the function and
place it at the top of the module: remove the function-level import of
AIConfiguratorPerfEstimator in aic_interpolation.py and add a module-level
import "from dynamo.planner.monitoring.aic_estimator import
AIConfiguratorPerfEstimator" near the other imports so the class is imported at
module load (the estimator import is cheap and the optional dependency only
fails on instantiation); ensure you delete the TYPE_CHECKING block or the
in-function import that references AIConfiguratorPerfEstimator so there are no
duplicate or nested imports.
In `@components/src/dynamo/profiler/utils/profile_common.py`:
- Around line 200-210: Hoist the function-local import into the module-level
imports: add "from dynamo.planner.config.planner_config import
PlannerPreDeploymentSweepMode" at the top of the file alongside the other
dynamo.planner imports, remove the inline import inside the block that checks
dgdr.features.planner, and leave the logic that reads
dgdr.features.planner.pre_deployment_sweeping_mode and compares it to
PlannerPreDeploymentSweepMode.Rapid unchanged so the function uses the
module-level symbol.
---
Nitpick comments:
In `@components/src/dynamo/planner/config/parallelization.py`:
- Around line 18-92: The docstrings use the Unicode multiplication sign "×"
which triggers Ruff RUF002; replace each "×" with " * " (or add a
module/file-level "# noqa: RUF002" if you prefer) in the docstring near the
assertion lines, and add a short regression test that constructs a few
PickedParallelConfig instances (DEP: moe_ep>1, TEP: moe_tp>1, dense:
moe_tp=moe_ep=1) and asserts picked_to_aic_model_config_kwargs(p)["tp_size"] ==
p.tp to lock in the intended mapping; also confirm
PickedParallelConfig.model_config = ConfigDict(frozen=True) remains if
downstream sweep/set usage requires hashability.
In `@components/src/dynamo/planner/monitoring/aic_estimator.py`:
- Around line 120-135: The code assigns runtime_config to self.runtime_config
inside estimate_perf but never uses it elsewhere, creating hidden mutable state;
change this by making runtime_config a local variable (e.g., runtime_config =
aiconfigurator.sdk.config.RuntimeConfig(...)) inside the estimate_perf method
instead of self.runtime_config, and update the call to session.run_static to
pass runtime_config (not self.runtime_config). Locate the assignment and the
session.run_static invocation in AIConfiguratorPerfEstimator. Ensure no other
methods reference self.runtime_config and remove the attribute if it's declared
on the class.
- Around line 15-23: The module currently configures the logger at import time
(logger = logging.getLogger(__name__), logger.setLevel(...), console_handler,
formatter, logger.addHandler(...)), which overrides app logging and can cause
duplicate output; remove the calls that set the level and add the StreamHandler
so the library only does logger = logging.getLogger(__name__), and instead
attach a logging.NullHandler() to the logger (e.g.,
logger.addHandler(logging.NullHandler())) if you need to avoid “No handler”
warnings—leave formatter/console_handler and any setLevel calls out of the
module.
- Around line 26-34: The function _try_import_aiconfigurator performs only
submodule imports (aiconfigurator.sdk.*) but returns the bare name
aiconfigurator which may not be explicitly bound in all implementations; update
_try_import_aiconfigurator to first do an explicit import aiconfigurator (before
importing aiconfigurator.sdk.config, aiconfigurator.sdk.models, etc.), then keep
the existing submodule imports and return aiconfigurator so the returned symbol
is unambiguous and guaranteed to be bound in the function scope.
In `@components/src/dynamo/planner/monitoring/aic_interpolation.py`:
- Around line 158-161: Add a one-line comment above the batch_size_per_rank
calculation (the block using max_concurrency_aggregate,
spec.decode_interpolation_granularity and computing batch_size_per_rank = max(1,
num_req_aggregate // attention_dp)) explaining the rounding trade-off: when
num_req_aggregate < attention_dp we clamp to 1 which effectively simulates
attention_dp requests (one per rank) even though the FPM records only
num_req_aggregate, so the regression will see an over‑provisioned shape at the
low end; mention this is an accepted approximation consistent with similar logic
elsewhere.
In `@components/src/dynamo/planner/tests/unit/test_aic_interpolation.py`:
- Around line 216-347: In test_prefill_sweep_shape_and_fpm_convention the tuple
from _patch_estimator is destructured as "ctx_patch, estimator" but estimator is
never used; change the unpacking to "ctx_patch, _ = _patch_estimator(...)" to
silence the unused-variable lint (RUF059) while keeping the mock behavior
intact—locate the call to _patch_estimator inside
test_prefill_sweep_shape_and_fpm_convention and replace the "estimator"
identifier with an underscore.
In `@components/src/dynamo/profiler/profile_sla.py`:
- Around line 385-395: The broad except Exception around calling
get_model_config_from_model_path(resolve_model_path(dgdr)) should be narrowed
and the error logged; replace the blanket except with explicit expected
exceptions (e.g., FileNotFoundError, OSError, KeyError, ValueError) and in the
handler call logger.warning with exc_info=True so the stacktrace is preserved,
then keep the existing fallback assignment to sweep_max_context_length (isl * 2
or 8192) unchanged; ensure references to get_model_config_from_model_path,
resolve_model_path, sweep_max_context_length, and logger.warning are updated
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8134f125-0097-441c-863c-05a7bf3a2c04
📒 Files selected for processing (16)
components/src/dynamo/planner/config/aic_interpolation_spec.pycomponents/src/dynamo/planner/config/parallelization.pycomponents/src/dynamo/planner/config/planner_config.pycomponents/src/dynamo/planner/core/adapters.pycomponents/src/dynamo/planner/monitoring/aic_estimator.pycomponents/src/dynamo/planner/monitoring/aic_interpolation.pycomponents/src/dynamo/planner/monitoring/perf_metrics.pycomponents/src/dynamo/planner/tests/unit/test_advisory_mode.pycomponents/src/dynamo/planner/tests/unit/test_aic_interpolation.pycomponents/src/dynamo/planner/tests/unit/test_perf_metrics_priority.pycomponents/src/dynamo/profiler/profile_sla.pycomponents/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.pycomponents/src/dynamo/profiler/utils/config_modifiers/parallelization_mapping.pycomponents/src/dynamo/profiler/utils/dgd_generation.pycomponents/src/dynamo/profiler/utils/estimate_perf.pycomponents/src/dynamo/profiler/utils/profile_common.py
…cked DGD When the profiler emits a DGD with vLLM workers, each worker now gets DYN_BENCHMARK_MODE set to its role (prefill / decode / agg). vLLM's startup self-benchmark then populates the get_perf_metrics endpoint, which the planner consumes at priority 1 of its bootstrap chain — superseding AIC simulation and NPZ files. Net effect: a freshly deployed vLLM DGD starts up, runs its own sweep, and the planner reads live measured FPMs instead of AIC's approximation. AIC remains the priority-2 fallback when the endpoint is unreachable; NPZ files remain priority 3. The worker-name → role map is sourced from VllmComponentName (now gains an agg_worker_k8s_name = "VllmWorker" attribute) so dgd_generation stays in sync with the planner/profiler name registry. Skipped when the mocker backend is active — mocker workers don't speak DYN_BENCHMARK_MODE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
1. aic_interpolation.py: hoist AIConfiguratorPerfEstimator import to module top. The wrapper class itself lazy-imports aiconfigurator inside _try_import_aiconfigurator(), so the class-level import is cheap and doesn't pull in the optional dependency until instantiation. 2. profile_common.py: hoist PlannerPreDeploymentSweepMode to the module imports. The "dep cycle" comment was factually wrong — profile_common already transitively imports planner_config.py via dgdr_v1beta1_types. 3. _concurrency_sweep: fix ZeroDivisionError when granularity == 1. Previously (max-1)/(gran-1) divided by zero; now returns [max] so the single sample hits the top of the range. 4. _sweep_decode: add explicit precondition check mirroring _sweep_prefill, so a too-small sweep_max_context_length surfaces a clear ValueError instead of the generic "only 0 valid points" error. 5. Add pytest markers (gpu_0 / pre_merge / unit) to the three new test files per .ai/pytest-guidelines.md. 6. Add regression test for the granularity=1 guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
The long assert was reformatted by black 23.1.0 — this is the uncommitted local fixup that made CI pre-commit fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…behavior The old test_planner_no_mocker_returns_config_with_planner_cm expected add_profile_data_to_config to be called for the rapid + planner case. After step 6, needs_profile_data() returns False for planner-only rapid (planner runs AIC at bootstrap instead), so the profile-data ConfigMap is no longer emitted — hence the CI failure after rebase. Split into two tests covering both sweep modes: * test_rapid_planner_no_mocker_skips_profile_cm — asserts the profile-data ConfigMap is NOT emitted and result == [planner_cm, dgd_config] for the rapid path. * test_thorough_planner_no_mocker_returns_config_with_both_cms — asserts thorough still emits both ConfigMaps. Other tests in the class use mocker mode and are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Summary
Moves AIConfigurator-based interpolation out of the profiler and into the planner for
pre_deployment_sweeping_mode: rapid. The profiler now picks configurations and hands them off via a newaic_interpolationblock on the planner ConfigMap; the planner runs the AIC sweep in-process at bootstrap and seeds its regression models directly from the synthesised FPMs — no more NPZ round-trip through disk.This also fixes the 7 silent MoE-DEP correctness issues that made rapid mode crash or quietly mis-model Qwen/Qwen3-235B-A22B-FP8 and other MoE picks using DEP/TEP/hybrid parallelism.
Priority chain for pre-deployment FPMs (highest → lowest):
get_perf_metricsendpoint (live self-benchmark)aic_interpolationspec is set)profile_results_dir(thorough mode)Thorough mode is unchanged — profiler still runs real-GPU interpolation, bakes NPZ into the profile-data ConfigMap, planner reads files. Mocker rapid-mode deployments also unchanged; they keep running the existing AIC-to-NPZ sweep in the profiler.
What changed
New planner modules:
planner/config/parallelization.py— home forPickedParallelConfig(moved from profiler) +picked_to_aic_model_config_kwargshelperplanner/config/aic_interpolation_spec.py— pydanticAICInterpolationSpecfor profiler → planner handoffplanner/monitoring/aic_estimator.py— home forAIConfiguratorPerfEstimator(moved from profiler)planner/monitoring/aic_interpolation.py—run_aic_interpolation()lands the MoE-DEP fixesPlanner wiring:
PlannerConfig.aic_interpolationfieldfetch_pre_deployment_metricsgains priority-2 AIC branch with lazyaiconfiguratorimportadapters.pycall sites forwardaic_spec=self.config.aic_interpolationProfiler flip:
needs_profile_data()returns False for planner-only rapid (mocker path unaffected)dgd_generation.py:build_aic_interpolation_spec()populates the spec; rapid+planner-only skipsadd_profile_data_to_configprofile_sla.pycomputessweep_max_context_lengthonce and builds the specBack-compat shims for existing profiler importers of
PickedParallelConfigandAIConfiguratorPerfEstimator.MoE-DEP bug fixes (landed in
aic_interpolation.py)moe_tp_size/moe_ep_sizemissing →TypeError: None × Nonepicked_to_aic_model_config_kwargsPickedParallelConfig.tp_size(KV-split property) fed to AIC as attention TPp.tp; property now documented as KV-head-split onlyattention_dp_sizenever reachesModelConfigRuntimeConfig.batch_sizetreated as aggregatenum_req // dp)get_max_kv_tokensper-rank but downstream math expects aggregateattention_dp_sizeattention_dp_sizeTest plan
test_aic_interpolation.py,test_perf_metrics_priority.py,test_dgd_generation_aic.pyTestPickedToAicKwargs— helper correctness across TP/TEP/DEP/hybridTestRunAicInterpolation— sweeps with mocked estimator; every call asserts MoE kwargs + identityTestPriorityChain— 6 tests covering all fallback pathsTestQwen235MoEPicks— end-to-end with the exact Qwen-235B pick shapes179 passed, 11 skippedlocally (skips are pre-existingdynamo.llm/deploybinding absences in the dev venv)BuildingCurvesphase, DGD has onlyplanner-configConfigMap, planner logsLoaded N FPMs from AIC interpolationpre_deployment_sweeping_mode: thorough: confirm both ConfigMaps still produced and planner reads filesget_perf_metricsendpoint available: confirm planner short-circuits at priority 1Migration notes
aiconfiguratoris imported lazily; if the planner image ships without it, rapid-mode deployments will surface a clearImportErrorand fall through to the file loader. Ops will want to ensure the planner image bundles aiconfigurator for rapid-mode deployments.aic_interpolationcontinue to work unchanged (pydantic default isNone).🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Tests