feat(profiler): wire mocker-rapid to direct AIC flags, drop profiler AIC interp#8455
Conversation
…ofiler AIC interpolation The profiler's AIC-to-NPZ interpolation path only existed to seed mocker workers; mocker already supports `--aic-perf-model` to call AIConfigurator at runtime. Route mocker-rapid through the flag path and delete the profiler's dead AIC helpers (obsoletes #8444). Mocker: - new `--aic-backend` arg so AIC database lookups can target a backend other than `--engine-type` (needed for trtllm data + vllm simulation) Profiler: - `generate_mocker_config()` injects `--aic-perf-model`/`--aic-backend`/ `--aic-system`/`--aic-{tp,moe-tp,moe-ep,attention-dp}-size` onto each mocker worker from the picks, matching `--engine-type` when AIC backend is vllm/sglang - `needs_profile_data()` returns False for mocker-rapid (no NPZ round-trip) - `build_aic_interpolation_spec()` now also fires for mocker-only rapid - `run_interpolation()` short-circuits for any non-Thorough sweep; removed the AIC branches and `profile_{prefill,decode}_aiconfigurator` helpers - deleted `profiler/utils/estimate_perf.py` shim (impl is in the planner) Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
WalkthroughThis pull request integrates explicit AIC backend configuration into the mocker system and refactors the profiler to eliminate rapid-mode AIC estimator paths, focusing solely on thorough-mode real-GPU profiling. The changes decouple AIC backend selection from engine type and remove redundant AIC performance estimation functions. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
components/src/dynamo/profiler/interpolation.py (1)
100-194: Consider try/finally around the post-ready block (and DRY the two sides).If
profile_prefill(line 123) orprofile_decode(line 181) raises, the deployment and its entry indeployment_clientsremain live until the outer caller cleans up. Wrapping the ready-to-profile sequence intry/finally(delete + remove) keeps teardown local and symmetric with theTimeoutErrorbranch.The prefill and decode blocks also differ only by a few lines (
max_kv_tokensderivation, profile callee). Extracting a small helper that creates the client, awaits readiness, and guarantees cleanup would halve this code and ensure both sides stay in sync when the profiling protocol evolves.♻️ Sketch
async def _run_side_interpolation( config_dict: dict, work_dir: str, ops: ProfilerOperationalConfig, config_modifier, model_name: str, deployment_clients: list[DynamoDeploymentClient], run_profile, # callable given (base_url, client) -> None ) -> bool: os.makedirs(work_dir, exist_ok=True) cfg_fn = f"{work_dir}/config.yaml" with open(cfg_fn, "w") as f: yaml.dump(config_dict, f) client = DynamoDeploymentClient( namespace=ops.k8s_namespace, base_log_dir=work_dir, model_name=model_name, frontend_port=config_modifier.get_port(config_dict), deployment_name=config_dict["metadata"]["name"], ) deployment_clients.append(client) await client.create_deployment(cfg_fn) try: await client.wait_for_deployment_ready(timeout=ops.deployment_timeout) except TimeoutError: logger.error("Interpolation deployment timed out, skipping.") await client.delete_deployment() deployment_clients.remove(client) return False try: await client.get_deployment_logs() run_profile(client) finally: await client.delete_deployment() deployment_clients.remove(client) return True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/profiler/interpolation.py` around lines 100 - 194, The prefill and decode interpolation sequences can leak deployments if profile_prefill or profile_decode raise because the DynamoDeploymentClient and its entry in deployment_clients are only removed in the non-exception path; wrap the "ready-to-profile" section in a try/finally that always calls await client.delete_deployment() and deployment_clients.remove(client), and refactor the duplicated logic into a helper (e.g. _run_side_interpolation) that takes the config dict, work_dir, ops, config_modifier, model_name, deployment_clients and a run_profile callable (used for profile_prefill or profile_decode) so both flows create the client, await client.wait_for_deployment_ready(...), call await client.get_deployment_logs(), run the profiling callable, and guarantee cleanup in finally; update prefill and decode flows to call this helper and move unique steps (like computing max_kv_tokens via config_modifier.get_kv_cache_size_from_dynamo_log and decoding service name via get_service_name_by_type) into the arguments passed to the helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/src/dynamo/profiler/utils/dgd_generation.py`:
- Around line 511-534: Update the docstring to accurately reflect the condition
that returns None: the code checks planner_needs_aic (which requires
is_planner_enabled(dgdr) and planner.enable_throughput_scaling) and
is_mocker_enabled(dgdr), so change the bullet "neither planner nor mocker is
enabled" to something like "no AIC consumer needs AIC (planner
throughput-scaling disabled and mocker disabled)" or similar wording; reference
planner_needs_aic, is_mocker_enabled, is_planner_enabled, and
planner.enable_throughput_scaling to make the intent clear.
---
Nitpick comments:
In `@components/src/dynamo/profiler/interpolation.py`:
- Around line 100-194: The prefill and decode interpolation sequences can leak
deployments if profile_prefill or profile_decode raise because the
DynamoDeploymentClient and its entry in deployment_clients are only removed in
the non-exception path; wrap the "ready-to-profile" section in a try/finally
that always calls await client.delete_deployment() and
deployment_clients.remove(client), and refactor the duplicated logic into a
helper (e.g. _run_side_interpolation) that takes the config dict, work_dir, ops,
config_modifier, model_name, deployment_clients and a run_profile callable (used
for profile_prefill or profile_decode) so both flows create the client, await
client.wait_for_deployment_ready(...), call await client.get_deployment_logs(),
run the profiling callable, and guarantee cleanup in finally; update prefill and
decode flows to call this helper and move unique steps (like computing
max_kv_tokens via config_modifier.get_kv_cache_size_from_dynamo_log and decoding
service name via get_service_name_by_type) into the arguments passed to the
helper.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 84b96728-fbd1-45da-b9fb-ec1ccb695a57
📒 Files selected for processing (12)
components/src/dynamo/mocker/args.pycomponents/src/dynamo/mocker/config.pycomponents/src/dynamo/mocker/tests/unit/test_config.pycomponents/src/dynamo/profiler/interpolation.pycomponents/src/dynamo/profiler/profile_sla.pycomponents/src/dynamo/profiler/tests/unit/test_dgd_generation_aic.pycomponents/src/dynamo/profiler/tests/unit/test_helpers_profile_sla.pycomponents/src/dynamo/profiler/utils/dgd_generation.pycomponents/src/dynamo/profiler/utils/estimate_perf.pycomponents/src/dynamo/profiler/utils/profile_common.pycomponents/src/dynamo/profiler/utils/profile_decode.pycomponents/src/dynamo/profiler/utils/profile_prefill.py
💤 Files with no reviewable changes (2)
- components/src/dynamo/profiler/utils/estimate_perf.py
- components/src/dynamo/profiler/profile_sla.py
…dition Addresses CodeRabbit review: the "neither planner nor mocker is enabled" bullet didn't match the code, which also returns None for a planner with throughput scaling disabled (when mocker is also off). Reword accordingly. Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Summary
The profiler's AIC-to-NPZ interpolation path (in
profiler/interpolation.py,profile_prefill_aiconfigurator,profile_decode_aiconfigurator, andestimate_perf.py) only existed to seed mocker workers with latency data. Mocker already supports--aic-perf-modelto call AIConfigurator at runtime — same bridge the planner uses post-#8335. This PR routes mocker-rapid through those flags and deletes the dead profiler AIC code, which obsoletes #8444.Changes
Mocker (
components/src/dynamo/mocker/)args.py: new--aic-backendflag (vllm|sglang|trtllm) so AIC database lookups can target a backend independent of--engine-type. Needed for trtllm AIC data + vllm simulation, since--engine-typeis restricted to vllm/sglang.config.py: prefers--aic-backendover--engine-typewhen AIC is on.Profiler (
components/src/dynamo/profiler/)utils/dgd_generation.py:generate_mocker_config()now acceptsaic_specand injects--aic-perf-model,--aic-backend,--aic-system,--aic-tp-size,--aic-moe-tp-size,--aic-moe-ep-size,--aic-attention-dp-sizeonto each mocker worker from the per-rolePickedParallelConfig. Matches--engine-typewhen AIC backend is vllm/sglang.build_aic_interpolation_spec()now also fires for mocker-only rapid deployments (previously required planner + throughput scaling).utils/profile_common.py:needs_profile_data()returns False for mocker-rapid (NPZ no longer needed).interpolation.py:run_interpolation()short-circuits for any non-Thorough sweep mode; all rapid AIC branches removed. Signature slimmed (dropped unusedmodel/system/isl/osl).utils/profile_prefill.py,utils/profile_decode.py: deletedprofile_{prefill,decode}_aiconfiguratorhelpers.utils/estimate_perf.py: deleted — was a shim; real impl is inplanner/monitoring/aic_estimator.py.profile_sla.py: updatedrun_interpolationcall site for new signature.What's unchanged
aic_interpolation.py.Test plan
components/src/dynamo/profiler/tests/unit/— 101 passedcomponents/src/dynamo/planner/tests/unit/— 322 passedcomponents/src/dynamo/mocker/tests/unit/— 9 passedcomponents/src/dynamo/profiler/tests/integration/test_profile_sla_dgdr.py— 20 passed (2m14s)Report pytest markersis a pre-existing unrelatedkvbm.trtllm_integrationimport error on main)--aic-perf-modelflags and no--planner-profile-data, and profiler does not emit the profile-data ConfigMapFollow-up
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
--aic-backendcommand-line option to specify AI Configurator backend selection for performance modeling (vllm, sglang, or trtllm).Refactor