Skip to content

feat(profiler): wire mocker-rapid to direct AIC flags, drop profiler AIC interp#8455

Merged
tedzhouhk merged 2 commits into
mainfrom
hzhou/mocker-aic-flags
Apr 21, 2026
Merged

feat(profiler): wire mocker-rapid to direct AIC flags, drop profiler AIC interp#8455
tedzhouhk merged 2 commits into
mainfrom
hzhou/mocker-aic-flags

Conversation

@tedzhouhk
Copy link
Copy Markdown
Contributor

@tedzhouhk tedzhouhk commented Apr 21, 2026

Summary

The profiler's AIC-to-NPZ interpolation path (in profiler/interpolation.py, profile_prefill_aiconfigurator, profile_decode_aiconfigurator, and estimate_perf.py) only existed to seed mocker workers with latency data. Mocker already supports --aic-perf-model to call AIConfigurator at runtime — same bridge the planner uses post-#8335. This PR routes mocker-rapid through those flags and deletes the dead profiler AIC code, which obsoletes #8444.

Changes

Mocker (components/src/dynamo/mocker/)

  • args.py: new --aic-backend flag (vllm|sglang|trtllm) so AIC database lookups can target a backend independent of --engine-type. Needed for trtllm AIC data + vllm simulation, since --engine-type is restricted to vllm/sglang.
  • config.py: prefers --aic-backend over --engine-type when AIC is on.

Profiler (components/src/dynamo/profiler/)

  • utils/dgd_generation.py:
    • generate_mocker_config() now accepts aic_spec and injects --aic-perf-model, --aic-backend, --aic-system, --aic-tp-size, --aic-moe-tp-size, --aic-moe-ep-size, --aic-attention-dp-size onto each mocker worker from the per-role PickedParallelConfig. Matches --engine-type when AIC backend is vllm/sglang.
    • build_aic_interpolation_spec() now also fires for mocker-only rapid deployments (previously required planner + throughput scaling).
  • utils/profile_common.py: needs_profile_data() returns False for mocker-rapid (NPZ no longer needed).
  • interpolation.py: run_interpolation() short-circuits for any non-Thorough sweep mode; all rapid AIC branches removed. Signature slimmed (dropped unused model/system/isl/osl).
  • utils/profile_prefill.py, utils/profile_decode.py: deleted profile_{prefill,decode}_aiconfigurator helpers.
  • utils/estimate_perf.py: deleted — was a shim; real impl is in planner/monitoring/aic_estimator.py.
  • profile_sla.py: updated run_interpolation call site for new signature.

What's unchanged

Test plan

  • components/src/dynamo/profiler/tests/unit/ — 101 passed
  • components/src/dynamo/planner/tests/unit/ — 322 passed
  • components/src/dynamo/mocker/tests/unit/ — 9 passed
  • components/src/dynamo/profiler/tests/integration/test_profile_sla_dgdr.py — 20 passed (2m14s)
  • pre-commit hooks pass (sole failure Report pytest markers is a pre-existing unrelated kvbm.trtllm_integration import error on main)
  • Manual E2E: mocker + planner + rapid Qwen3-235B-A22B-FP8 DGDR → confirm mocker pods start with --aic-perf-model flags and no --planner-profile-data, and profiler does not emit the profile-data ConfigMap

Follow-up

🤖 Generated with Claude Code


Open in Devin Review

Summary by CodeRabbit

Release Notes

  • New Features

    • Added --aic-backend command-line option to specify AI Configurator backend selection for performance modeling (vllm, sglang, or trtllm).
  • Refactor

    • Simplified profiling system to focus on thorough-mode real-GPU interpolation only.
    • Removed rapid-mode estimator-based performance prediction paths and streamlined configuration handling for mocker-enabled workflows.

…ofiler AIC interpolation

The profiler's AIC-to-NPZ interpolation path only existed to seed mocker
workers; mocker already supports `--aic-perf-model` to call AIConfigurator
at runtime. Route mocker-rapid through the flag path and delete the
profiler's dead AIC helpers (obsoletes #8444).

Mocker:
- new `--aic-backend` arg so AIC database lookups can target a backend
  other than `--engine-type` (needed for trtllm data + vllm simulation)

Profiler:
- `generate_mocker_config()` injects `--aic-perf-model`/`--aic-backend`/
  `--aic-system`/`--aic-{tp,moe-tp,moe-ep,attention-dp}-size` onto each
  mocker worker from the picks, matching `--engine-type` when AIC backend
  is vllm/sglang
- `needs_profile_data()` returns False for mocker-rapid (no NPZ round-trip)
- `build_aic_interpolation_spec()` now also fires for mocker-only rapid
- `run_interpolation()` short-circuits for any non-Thorough sweep; removed
  the AIC branches and `profile_{prefill,decode}_aiconfigurator` helpers
- deleted `profiler/utils/estimate_perf.py` shim (impl is in the planner)

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@tedzhouhk tedzhouhk requested review from a team as code owners April 21, 2026 18:43
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

Walkthrough

This pull request integrates explicit AIC backend configuration into the mocker system and refactors the profiler to eliminate rapid-mode AIC estimator paths, focusing solely on thorough-mode real-GPU profiling. The changes decouple AIC backend selection from engine type and remove redundant AIC performance estimation functions.

Changes

Cohort / File(s) Summary
Mocker AIC Backend Configuration
components/src/dynamo/mocker/args.py, components/src/dynamo/mocker/config.py, components/src/dynamo/mocker/tests/unit/test_config.py
Added --aic-backend CLI argument (choices: vllm, sglang, trtllm) to control AIC perf database lookup, with fallback to --engine-type when unset. Updated mocker engine arg builder to prefer explicit AIC backend over engine type. Added test helper parameter and new test validating AIC backend override independence from engine type.
Profiler Interpolation Refactoring
components/src/dynamo/profiler/interpolation.py, components/src/dynamo/profiler/profile_sla.py
Removed rapid-mode generation and AIC estimator code paths from run_interpolation. Function now returns early unless sweep mode is Thorough. Simplified to unconditional real-GPU deployment + profiling for thorough-mode. Updated function signature removing unused parameters (model, system, isl, osl). Call site in profile_sla.py updated to not pass removed arguments.
AIC Estimator Function Cleanup
components/src/dynamo/profiler/utils/profile_prefill.py, components/src/dynamo/profiler/utils/profile_decode.py, components/src/dynamo/profiler/utils/estimate_perf.py
Removed profile_prefill_aiconfigurator and profile_decode_aiconfigurator functions and deleted re-export shim estimate_perf.py. Eliminated dependencies on AIConfiguratorPerfEstimator and custom AIC perf estimation closures.
DGD Generation and AIC Integration
components/src/dynamo/profiler/utils/dgd_generation.py, components/src/dynamo/profiler/utils/profile_common.py
Extended generate_mocker_config to accept optional aic_spec and inject --aic-* CLI flags (backend, system, parallelism sizes) into mocker worker args. Added helpers _mocker_aic_worker_picks and _inject_mocker_aic_args. Updated build_aic_interpolation_spec to return spec for mocker-only rapid sweeping and handle missing planner gracefully. Modified needs_profile_data() to require profile artifacts only for thorough-mode deployments, permitting rapid-mode to use in-process/AIC SDK paths.
Profiler Test Updates
components/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.py, components/src/dynamo/profiler/tests/unit/test_helpers_profile_sla.py
Added comprehensive tests validating build_aic_interpolation_spec behavior for mocker-only rapid sweeping, AIC argument injection (including conditional engine-type override), and needs_profile_data expectations for mocker rapid/thorough modes. Replaced and extended profile SLA test cases for mocker + planner rapid/thorough scenarios with updated add_profile_data_to_config call expectations and adjusted argument index for run_interpolation regression test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and well-structured, including summary, detailed changes, what's unchanged, test results, and follow-up actions.
Linked Issues check ✅ Passed PR #8455 successfully addresses objectives from issue #8444 by fixing MoE parameter passing to AIC estimators and removing the AIC interpolation path in favor of direct AIC flags in mocker rapid.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the stated objectives: adding --aic-backend flag, injecting AIC flags into mocker workers, removing profiler AIC code, and updating related function signatures.
Title check ✅ Passed The title accurately captures the main changes: mocker now wires to direct AIC flags, and profiler's AIC interpolation path is removed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
components/src/dynamo/profiler/interpolation.py (1)

100-194: Consider try/finally around the post-ready block (and DRY the two sides).

If profile_prefill (line 123) or profile_decode (line 181) raises, the deployment and its entry in deployment_clients remain live until the outer caller cleans up. Wrapping the ready-to-profile sequence in try/finally (delete + remove) keeps teardown local and symmetric with the TimeoutError branch.

The prefill and decode blocks also differ only by a few lines (max_kv_tokens derivation, profile callee). Extracting a small helper that creates the client, awaits readiness, and guarantees cleanup would halve this code and ensure both sides stay in sync when the profiling protocol evolves.

♻️ Sketch
async def _run_side_interpolation(
    config_dict: dict,
    work_dir: str,
    ops: ProfilerOperationalConfig,
    config_modifier,
    model_name: str,
    deployment_clients: list[DynamoDeploymentClient],
    run_profile,  # callable given (base_url, client) -> None
) -> bool:
    os.makedirs(work_dir, exist_ok=True)
    cfg_fn = f"{work_dir}/config.yaml"
    with open(cfg_fn, "w") as f:
        yaml.dump(config_dict, f)

    client = DynamoDeploymentClient(
        namespace=ops.k8s_namespace,
        base_log_dir=work_dir,
        model_name=model_name,
        frontend_port=config_modifier.get_port(config_dict),
        deployment_name=config_dict["metadata"]["name"],
    )
    deployment_clients.append(client)
    await client.create_deployment(cfg_fn)
    try:
        await client.wait_for_deployment_ready(timeout=ops.deployment_timeout)
    except TimeoutError:
        logger.error("Interpolation deployment timed out, skipping.")
        await client.delete_deployment()
        deployment_clients.remove(client)
        return False
    try:
        await client.get_deployment_logs()
        run_profile(client)
    finally:
        await client.delete_deployment()
        deployment_clients.remove(client)
    return True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/profiler/interpolation.py` around lines 100 - 194, The
prefill and decode interpolation sequences can leak deployments if
profile_prefill or profile_decode raise because the DynamoDeploymentClient and
its entry in deployment_clients are only removed in the non-exception path; wrap
the "ready-to-profile" section in a try/finally that always calls await
client.delete_deployment() and deployment_clients.remove(client), and refactor
the duplicated logic into a helper (e.g. _run_side_interpolation) that takes the
config dict, work_dir, ops, config_modifier, model_name, deployment_clients and
a run_profile callable (used for profile_prefill or profile_decode) so both
flows create the client, await client.wait_for_deployment_ready(...), call await
client.get_deployment_logs(), run the profiling callable, and guarantee cleanup
in finally; update prefill and decode flows to call this helper and move unique
steps (like computing max_kv_tokens via
config_modifier.get_kv_cache_size_from_dynamo_log and decoding service name via
get_service_name_by_type) into the arguments passed to the helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/profiler/utils/dgd_generation.py`:
- Around line 511-534: Update the docstring to accurately reflect the condition
that returns None: the code checks planner_needs_aic (which requires
is_planner_enabled(dgdr) and planner.enable_throughput_scaling) and
is_mocker_enabled(dgdr), so change the bullet "neither planner nor mocker is
enabled" to something like "no AIC consumer needs AIC (planner
throughput-scaling disabled and mocker disabled)" or similar wording; reference
planner_needs_aic, is_mocker_enabled, is_planner_enabled, and
planner.enable_throughput_scaling to make the intent clear.

---

Nitpick comments:
In `@components/src/dynamo/profiler/interpolation.py`:
- Around line 100-194: The prefill and decode interpolation sequences can leak
deployments if profile_prefill or profile_decode raise because the
DynamoDeploymentClient and its entry in deployment_clients are only removed in
the non-exception path; wrap the "ready-to-profile" section in a try/finally
that always calls await client.delete_deployment() and
deployment_clients.remove(client), and refactor the duplicated logic into a
helper (e.g. _run_side_interpolation) that takes the config dict, work_dir, ops,
config_modifier, model_name, deployment_clients and a run_profile callable (used
for profile_prefill or profile_decode) so both flows create the client, await
client.wait_for_deployment_ready(...), call await client.get_deployment_logs(),
run the profiling callable, and guarantee cleanup in finally; update prefill and
decode flows to call this helper and move unique steps (like computing
max_kv_tokens via config_modifier.get_kv_cache_size_from_dynamo_log and decoding
service name via get_service_name_by_type) into the arguments passed to the
helper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 84b96728-fbd1-45da-b9fb-ec1ccb695a57

📥 Commits

Reviewing files that changed from the base of the PR and between df888f6 and bcc6d16.

📒 Files selected for processing (12)
  • components/src/dynamo/mocker/args.py
  • components/src/dynamo/mocker/config.py
  • components/src/dynamo/mocker/tests/unit/test_config.py
  • components/src/dynamo/profiler/interpolation.py
  • components/src/dynamo/profiler/profile_sla.py
  • components/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.py
  • components/src/dynamo/profiler/tests/unit/test_helpers_profile_sla.py
  • components/src/dynamo/profiler/utils/dgd_generation.py
  • components/src/dynamo/profiler/utils/estimate_perf.py
  • components/src/dynamo/profiler/utils/profile_common.py
  • components/src/dynamo/profiler/utils/profile_decode.py
  • components/src/dynamo/profiler/utils/profile_prefill.py
💤 Files with no reviewable changes (2)
  • components/src/dynamo/profiler/utils/estimate_perf.py
  • components/src/dynamo/profiler/profile_sla.py

Comment thread components/src/dynamo/profiler/utils/dgd_generation.py
@tedzhouhk tedzhouhk changed the title feat(profiler,mocker): wire mocker rapid to direct AIC flags; drop profiler AIC interpolation feat(profiler): wire mocker-rapid to direct AIC flags, drop profiler AIC interp Apr 21, 2026
@github-actions github-actions Bot added the feat label Apr 21, 2026
…dition

Addresses CodeRabbit review: the "neither planner nor mocker is enabled"
bullet didn't match the code, which also returns None for a planner with
throughput scaling disabled (when mocker is also off). Reword accordingly.

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@tedzhouhk tedzhouhk enabled auto-merge (squash) April 21, 2026 20:26
@tedzhouhk tedzhouhk merged commit 55a949c into main Apr 21, 2026
84 of 85 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/mocker-aic-flags branch April 21, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants