Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@
- Source shared utilities: `source benchmark_lib.sh`
- Functions: `check_env_vars()`, `wait_for_server_ready()`, `run_benchmark_serving()`, `run_eval()`, `append_lm_eval_summary()`
- Parameters passed via environment variables
- **MTP scripts MUST pass `--use-chat-template` to `run_benchmark_serving` β€” no exceptions.** EAGLE-style speculative decoding is trained against chat-formatted inputs, so benchmarking against raw prompts silently regresses acceptance rate and produces misleading numbers. This applies to every `*_mtp.sh` script regardless of model, precision, or runner.

Check warning on line 166 in AGENTS.md

View check run for this annotation

Claude / Claude Code Review

MTP rule omits IS_MTP env-var pattern for multi-node AMD benchmarks

The new MTP rule only references the `*_mtp.sh` naming convention, but multi-node AMD MTP benchmarks use a different pattern: `server.sh` exports `IS_MTP=true` when `DECODE_MTP_SIZE > 0`, and `bench.sh` conditionally adds `--use-chat-template` based on that variable. A developer adding a new multi-node AMD MTP configuration (not named `*_mtp.sh`) would read this rule and likely conclude it doesn't apply. Consider expanding the rule to mention both patterns.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 The new MTP rule only references the *_mtp.sh naming convention, but multi-node AMD MTP benchmarks use a different pattern: server.sh exports IS_MTP=true when DECODE_MTP_SIZE > 0, and bench.sh conditionally adds --use-chat-template based on that variable. A developer adding a new multi-node AMD MTP configuration (not named *_mtp.sh) would read this rule and likely conclude it doesn't apply. Consider expanding the rule to mention both patterns.

Extended reasoning...

Two distinct MTP patterns exist in the codebase, but the rule only documents one.

The newly added rule states: "This applies to every *_mtp.sh script regardless of model, precision, or runner." This accurately describes the single-node benchmark pattern (e.g., dsr1_fp8_b200_mtp.sh) where scripts are explicitly named with the _mtp suffix and must call run_benchmark_serving with --use-chat-template directly.

However, multi-node AMD MTP benchmarks follow a completely different mechanism. In benchmarks/multi_node/amd_utils/server.sh (lines 459–463), IS_MTP=true is exported when DECODE_MTP_SIZE > 0. Then in bench.sh (line 60), --use-chat-template is conditionally injected via $( [ "$IS_MTP" = "true" ] && echo "--use-chat-template" ). These scripts are named things like dsr1_fp8_mi355x_sglang-disagg.sh β€” they do not match *_mtp.sh.

Why the refutation is partially valid but incomplete: The refutation correctly observes that the existing bench.sh infrastructure already handles --use-chat-template automatically for multi-node AMD MTP configurations, meaning no current script is broken. However, this only holds for scripts that reuse bench.sh. A developer writing a new multi-node AMD MTP script that doesn't route through bench.sh, or who is setting up the IS_MTP=true dispatch chain for the first time, could read AGENTS.md, see their script isn't named *_mtp.sh, and not realize the requirement applies.

Concrete scenario: Developer adds a new multi-node AMD MTP configuration and writes a custom launch wrapper that calls run_benchmark_serving directly (bypassing bench.sh). They check AGENTS.md and see: "This applies to every *_mtp.sh script" β€” their script is not named *_mtp.sh, so they move on without adding --use-chat-template. The acceptance rate silently collapses, producing misleading benchmark numbers β€” exactly the failure mode the rule intends to prevent.

Suggested fix: Expand the final sentence to cover both patterns, e.g.: "This applies to every *_mtp.sh script AND to any script where IS_MTP=true is set (the multi-node AMD MTP pattern), regardless of model, precision, or runner."


### Git

Expand Down
Loading