feat(profiler): wire mocker-rapid to direct AIC flags, drop profiler AIC interp by tedzhouhk · Pull Request #8455 · ai-dynamo/dynamo

tedzhouhk · 2026-04-21T18:43:39Z

Summary

The profiler's AIC-to-NPZ interpolation path (in profiler/interpolation.py, profile_prefill_aiconfigurator, profile_decode_aiconfigurator, and estimate_perf.py) only existed to seed mocker workers with latency data. Mocker already supports --aic-perf-model to call AIConfigurator at runtime — same bridge the planner uses post-#8335. This PR routes mocker-rapid through those flags and deletes the dead profiler AIC code, which obsoletes #8444.

Changes

Mocker (components/src/dynamo/mocker/)

args.py: new --aic-backend flag (vllm|sglang|trtllm) so AIC database lookups can target a backend independent of --engine-type. Needed for trtllm AIC data + vllm simulation, since --engine-type is restricted to vllm/sglang.
config.py: prefers --aic-backend over --engine-type when AIC is on.

Profiler (components/src/dynamo/profiler/)

utils/dgd_generation.py:
- generate_mocker_config() now accepts aic_spec and injects --aic-perf-model, --aic-backend, --aic-system, --aic-tp-size, --aic-moe-tp-size, --aic-moe-ep-size, --aic-attention-dp-size onto each mocker worker from the per-role PickedParallelConfig. Matches --engine-type when AIC backend is vllm/sglang.
- build_aic_interpolation_spec() now also fires for mocker-only rapid deployments (previously required planner + throughput scaling).
utils/profile_common.py: needs_profile_data() returns False for mocker-rapid (NPZ no longer needed).
interpolation.py: run_interpolation() short-circuits for any non-Thorough sweep mode; all rapid AIC branches removed. Signature slimmed (dropped unused model/system/isl/osl).
utils/profile_prefill.py, utils/profile_decode.py: deleted profile_{prefill,decode}_aiconfigurator helpers.
utils/estimate_perf.py: deleted — was a shim; real impl is in planner/monitoring/aic_estimator.py.
profile_sla.py: updated run_interpolation call site for new signature.

What's unchanged

Thorough-mode interpolation still runs real-GPU sweeps and writes NPZ (both for planner throughput scaling and mocker).
Mocker + no-planner still falls back to polynomial latency simulation.
Planner-rapid (non-mocker) continues to bootstrap AIC in-process via feat(planner): own AIC interpolation; fix MoE-DEP bugs in rapid mode #8335's aic_interpolation.py.

Test plan

components/src/dynamo/profiler/tests/unit/ — 101 passed
components/src/dynamo/planner/tests/unit/ — 322 passed
components/src/dynamo/mocker/tests/unit/ — 9 passed
components/src/dynamo/profiler/tests/integration/test_profile_sla_dgdr.py — 20 passed (2m14s)
pre-commit hooks pass (sole failure Report pytest markers is a pre-existing unrelated kvbm.trtllm_integration import error on main)
Manual E2E: mocker + planner + rapid Qwen3-235B-A22B-FP8 DGDR → confirm mocker pods start with --aic-perf-model flags and no --planner-profile-data, and profiler does not emit the profile-data ConfigMap

Follow-up

Close fix(profiler): pass MoE parallelism kwargs to AIC interpolation estimator #8444 — the rapid-AIC branches it was patching are now deleted.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added --aic-backend command-line option to specify AI Configurator backend selection for performance modeling (vllm, sglang, or trtllm).
Refactor
- Simplified profiling system to focus on thorough-mode real-GPU interpolation only.
- Removed rapid-mode estimator-based performance prediction paths and streamlined configuration handling for mocker-enabled workflows.

…ofiler AIC interpolation The profiler's AIC-to-NPZ interpolation path only existed to seed mocker workers; mocker already supports `--aic-perf-model` to call AIConfigurator at runtime. Route mocker-rapid through the flag path and delete the profiler's dead AIC helpers (obsoletes #8444). Mocker: - new `--aic-backend` arg so AIC database lookups can target a backend other than `--engine-type` (needed for trtllm data + vllm simulation) Profiler: - `generate_mocker_config()` injects `--aic-perf-model`/`--aic-backend`/ `--aic-system`/`--aic-{tp,moe-tp,moe-ep,attention-dp}-size` onto each mocker worker from the picks, matching `--engine-type` when AIC backend is vllm/sglang - `needs_profile_data()` returns False for mocker-rapid (no NPZ round-trip) - `build_aic_interpolation_spec()` now also fires for mocker-only rapid - `run_interpolation()` short-circuits for any non-Thorough sweep; removed the AIC branches and `profile_{prefill,decode}_aiconfigurator` helpers - deleted `profiler/utils/estimate_perf.py` shim (impl is in the planner) Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

coderabbitai · 2026-04-21T18:49:13Z

Walkthrough

This pull request integrates explicit AIC backend configuration into the mocker system and refactors the profiler to eliminate rapid-mode AIC estimator paths, focusing solely on thorough-mode real-GPU profiling. The changes decouple AIC backend selection from engine type and remove redundant AIC performance estimation functions.

Changes

Cohort / File(s)	Summary
Mocker AIC Backend Configuration `components/src/dynamo/mocker/args.py`, `components/src/dynamo/mocker/config.py`, `components/src/dynamo/mocker/tests/unit/test_config.py`	Added `--aic-backend` CLI argument (choices: vllm, sglang, trtllm) to control AIC perf database lookup, with fallback to `--engine-type` when unset. Updated mocker engine arg builder to prefer explicit AIC backend over engine type. Added test helper parameter and new test validating AIC backend override independence from engine type.
Profiler Interpolation Refactoring `components/src/dynamo/profiler/interpolation.py`, `components/src/dynamo/profiler/profile_sla.py`	Removed rapid-mode generation and AIC estimator code paths from `run_interpolation`. Function now returns early unless sweep mode is Thorough. Simplified to unconditional real-GPU deployment + profiling for thorough-mode. Updated function signature removing unused parameters (`model`, `system`, `isl`, `osl`). Call site in `profile_sla.py` updated to not pass removed arguments.
AIC Estimator Function Cleanup `components/src/dynamo/profiler/utils/profile_prefill.py`, `components/src/dynamo/profiler/utils/profile_decode.py`, `components/src/dynamo/profiler/utils/estimate_perf.py`	Removed `profile_prefill_aiconfigurator` and `profile_decode_aiconfigurator` functions and deleted re-export shim `estimate_perf.py`. Eliminated dependencies on `AIConfiguratorPerfEstimator` and custom AIC perf estimation closures.
DGD Generation and AIC Integration `components/src/dynamo/profiler/utils/dgd_generation.py`, `components/src/dynamo/profiler/utils/profile_common.py`	Extended `generate_mocker_config` to accept optional `aic_spec` and inject `--aic-*` CLI flags (backend, system, parallelism sizes) into mocker worker args. Added helpers `_mocker_aic_worker_picks` and `_inject_mocker_aic_args`. Updated `build_aic_interpolation_spec` to return spec for mocker-only rapid sweeping and handle missing planner gracefully. Modified `needs_profile_data()` to require profile artifacts only for thorough-mode deployments, permitting rapid-mode to use in-process/AIC SDK paths.
Profiler Test Updates `components/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.py`, `components/src/dynamo/profiler/tests/unit/test_helpers_profile_sla.py`	Added comprehensive tests validating `build_aic_interpolation_spec` behavior for mocker-only rapid sweeping, AIC argument injection (including conditional engine-type override), and `needs_profile_data` expectations for mocker rapid/thorough modes. Replaced and extended profile SLA test cases for mocker + planner rapid/thorough scenarios with updated `add_profile_data_to_config` call expectations and adjusted argument index for `run_interpolation` regression test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description is comprehensive and well-structured, including summary, detailed changes, what's unchanged, test results, and follow-up actions.
Linked Issues check	✅ Passed	PR `#8455` successfully addresses objectives from issue `#8444` by fixing MoE parameter passing to AIC estimators and removing the AIC interpolation path in favor of direct AIC flags in mocker rapid.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the stated objectives: adding --aic-backend flag, injecting AIC flags into mocker workers, removing profiler AIC code, and updating related function signatures.
Title check	✅ Passed	The title accurately captures the main changes: mocker now wires to direct AIC flags, and profiler's AIC interpolation path is removed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

components/src/dynamo/profiler/interpolation.py (1)

100-194: Consider try/finally around the post-ready block (and DRY the two sides).

If profile_prefill (line 123) or profile_decode (line 181) raises, the deployment and its entry in deployment_clients remain live until the outer caller cleans up. Wrapping the ready-to-profile sequence in try/finally (delete + remove) keeps teardown local and symmetric with the TimeoutError branch.

The prefill and decode blocks also differ only by a few lines (max_kv_tokens derivation, profile callee). Extracting a small helper that creates the client, awaits readiness, and guarantees cleanup would halve this code and ensure both sides stay in sync when the profiling protocol evolves.

♻️ Sketch

async def _run_side_interpolation(
    config_dict: dict,
    work_dir: str,
    ops: ProfilerOperationalConfig,
    config_modifier,
    model_name: str,
    deployment_clients: list[DynamoDeploymentClient],
    run_profile,  # callable given (base_url, client) -> None
) -> bool:
    os.makedirs(work_dir, exist_ok=True)
    cfg_fn = f"{work_dir}/config.yaml"
    with open(cfg_fn, "w") as f:
        yaml.dump(config_dict, f)

    client = DynamoDeploymentClient(
        namespace=ops.k8s_namespace,
        base_log_dir=work_dir,
        model_name=model_name,
        frontend_port=config_modifier.get_port(config_dict),
        deployment_name=config_dict["metadata"]["name"],
    )
    deployment_clients.append(client)
    await client.create_deployment(cfg_fn)
    try:
        await client.wait_for_deployment_ready(timeout=ops.deployment_timeout)
    except TimeoutError:
        logger.error("Interpolation deployment timed out, skipping.")
        await client.delete_deployment()
        deployment_clients.remove(client)
        return False
    try:
        await client.get_deployment_logs()
        run_profile(client)
    finally:
        await client.delete_deployment()
        deployment_clients.remove(client)
    return True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/profiler/interpolation.py` around lines 100 - 194, The
prefill and decode interpolation sequences can leak deployments if
profile_prefill or profile_decode raise because the DynamoDeploymentClient and
its entry in deployment_clients are only removed in the non-exception path; wrap
the "ready-to-profile" section in a try/finally that always calls await
client.delete_deployment() and deployment_clients.remove(client), and refactor
the duplicated logic into a helper (e.g. _run_side_interpolation) that takes the
config dict, work_dir, ops, config_modifier, model_name, deployment_clients and
a run_profile callable (used for profile_prefill or profile_decode) so both
flows create the client, await client.wait_for_deployment_ready(...), call await
client.get_deployment_logs(), run the profiling callable, and guarantee cleanup
in finally; update prefill and decode flows to call this helper and move unique
steps (like computing max_kv_tokens via
config_modifier.get_kv_cache_size_from_dynamo_log and decoding service name via
get_service_name_by_type) into the arguments passed to the helper.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/profiler/utils/dgd_generation.py`:
- Around line 511-534: Update the docstring to accurately reflect the condition
that returns None: the code checks planner_needs_aic (which requires
is_planner_enabled(dgdr) and planner.enable_throughput_scaling) and
is_mocker_enabled(dgdr), so change the bullet "neither planner nor mocker is
enabled" to something like "no AIC consumer needs AIC (planner
throughput-scaling disabled and mocker disabled)" or similar wording; reference
planner_needs_aic, is_mocker_enabled, is_planner_enabled, and
planner.enable_throughput_scaling to make the intent clear.

---

Nitpick comments:
In `@components/src/dynamo/profiler/interpolation.py`:
- Around line 100-194: The prefill and decode interpolation sequences can leak
deployments if profile_prefill or profile_decode raise because the
DynamoDeploymentClient and its entry in deployment_clients are only removed in
the non-exception path; wrap the "ready-to-profile" section in a try/finally
that always calls await client.delete_deployment() and
deployment_clients.remove(client), and refactor the duplicated logic into a
helper (e.g. _run_side_interpolation) that takes the config dict, work_dir, ops,
config_modifier, model_name, deployment_clients and a run_profile callable (used
for profile_prefill or profile_decode) so both flows create the client, await
client.wait_for_deployment_ready(...), call await client.get_deployment_logs(),
run the profiling callable, and guarantee cleanup in finally; update prefill and
decode flows to call this helper and move unique steps (like computing
max_kv_tokens via config_modifier.get_kv_cache_size_from_dynamo_log and decoding
service name via get_service_name_by_type) into the arguments passed to the
helper.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 84b96728-fbd1-45da-b9fb-ec1ccb695a57

📥 Commits

Reviewing files that changed from the base of the PR and between df888f6 and bcc6d16.

📒 Files selected for processing (12)

components/src/dynamo/mocker/args.py
components/src/dynamo/mocker/config.py
components/src/dynamo/mocker/tests/unit/test_config.py
components/src/dynamo/profiler/interpolation.py
components/src/dynamo/profiler/profile_sla.py
components/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.py
components/src/dynamo/profiler/tests/unit/test_helpers_profile_sla.py
components/src/dynamo/profiler/utils/dgd_generation.py
components/src/dynamo/profiler/utils/estimate_perf.py
components/src/dynamo/profiler/utils/profile_common.py
components/src/dynamo/profiler/utils/profile_decode.py
components/src/dynamo/profiler/utils/profile_prefill.py

💤 Files with no reviewable changes (2)

components/src/dynamo/profiler/utils/estimate_perf.py
components/src/dynamo/profiler/profile_sla.py

…dition Addresses CodeRabbit review: the "neither planner nor mocker is enabled" bullet didn't match the code, which also returns None for a planner with throughput scaling disabled (when mocker is also off). Reword accordingly. Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

tedzhouhk requested review from a team as code owners April 21, 2026 18:43

pull-request-size Bot added the size/XL label Apr 21, 2026

github-actions Bot added the planner label Apr 21, 2026

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread components/src/dynamo/profiler/utils/dgd_generation.py

tedzhouhk changed the title ~~feat(profiler,mocker): wire mocker rapid to direct AIC flags; drop profiler AIC interpolation~~ feat(profiler): wire mocker-rapid to direct AIC flags, drop profiler AIC interp Apr 21, 2026

github-actions Bot added the feat label Apr 21, 2026

copy-pr-bot Bot temporarily deployed to GITLAB April 21, 2026 20:05 Inactive

PeaBrane approved these changes Apr 21, 2026

View reviewed changes

tedzhouhk enabled auto-merge (squash) April 21, 2026 20:26

copy-pr-bot Bot had a problem deploying to GITLAB April 21, 2026 20:33 Failure

tedzhouhk merged commit 55a949c into main Apr 21, 2026
84 of 85 checks passed

tedzhouhk deleted the hzhou/mocker-aic-flags branch April 21, 2026 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(profiler): wire mocker-rapid to direct AIC flags, drop profiler AIC interp#8455

feat(profiler): wire mocker-rapid to direct AIC flags, drop profiler AIC interp#8455
tedzhouhk merged 2 commits into
mainfrom
hzhou/mocker-aic-flags

tedzhouhk commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tedzhouhk commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

What's unchanged

Test plan

Follow-up

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tedzhouhk commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading