Larryli2/minimax m2.5 fp8 eagle3#1234
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| SPEC_DRAFT_MODEL=${SPEC_DRAFT_MODEL:-MiniMaxAI/MiniMax-M2.5-Eagle3} | ||
| SPEC_NUM_TOKENS=${SPEC_NUM_TOKENS:-3} | ||
| SPEC_DRAFT_TP=${SPEC_DRAFT_TP:-1} | ||
|
|
||
| hf download "$SPEC_DRAFT_MODEL" |
There was a problem hiding this comment.
🔴 The PR description points to the Eagle3 draft model at huggingface.co/thoughtworks/MiniMax-M2.5-Eagle3 (thoughtworks org), but the script defaults SPEC_DRAFT_MODEL to MiniMaxAI/MiniMax-M2.5-Eagle3 (MiniMaxAI org), and perf-changelog.yaml lists the same MiniMaxAI/ path — these can't both be correct. Since the script unconditionally runs hf download "$SPEC_DRAFT_MODEL" before vllm-serve, if the weights are actually only published under thoughtworks/ the benchmark will 404 at startup; please reconcile the PR description, the script default, and the changelog so they all reference the real HF repo.
Extended reasoning...
What is the bug?
The Eagle3 draft model identifier is inconsistent across three places that should agree:
- PR description: "Eagle3 draft model https://huggingface.co/thoughtworks/MiniMax-M2.5-Eagle3/tree/main" — i.e. the thoughtworks org.
benchmarks/single_node/minimaxm2.5_fp8_mi355x_eagle3.sh:22:SPEC_DRAFT_MODEL=${SPEC_DRAFT_MODEL:-MiniMaxAI/MiniMax-M2.5-Eagle3}— the MiniMaxAI org.perf-changelog.yamlentry forminimaxm2.5-fp8-mi355x-vllm-eagle3:"Draft model: MiniMaxAI/MiniMax-M2.5-Eagle3 from Hugging Face Hub"— also the MiniMaxAI org.
Why the existing code does not save us
The benchmark script unconditionally executes hf download "$SPEC_DRAFT_MODEL" (line 26) before vllm serve is invoked, and SPEC_DRAFT_MODEL is also baked into the --speculative-config JSON passed to vLLM. The MI355X launcher (runners/launch_mi355x-amds.sh) does not export SPEC_DRAFT_MODEL anywhere, so the script default is what actually executes in CI. There is no override path that would substitute the correct namespace at runtime.
Step-by-step proof of failure (assuming the PR description is correct and only thoughtworks/MiniMax-M2.5-Eagle3 exists)
- CI selects the new
minimaxm2.5-fp8-mi355x-vllm-eagle3config fromamd-master.yaml. - The launcher invokes
benchmarks/single_node/minimaxm2.5_fp8_mi355x_eagle3.sh(resolved via the newSPEC_SUFFIX='_eagle3'case inrunners/launch_mi355x-amds.sh). - Inside the script,
SPEC_DRAFT_MODELis unset by the caller, so it defaults toMiniMaxAI/MiniMax-M2.5-Eagle3(line 22). - Line 26 runs
hf download MiniMaxAI/MiniMax-M2.5-Eagle3. Since (per the PR description) the weights live underthoughtworks/, this resolves to a non-existent repo and returns a 401/404 from Hugging Face Hub. hf downloadexits non-zero; the script aborts beforevllm serveis launched, killing the entire Eagle3 sweep.
The symmetric case (weights actually published under MiniMaxAI/) makes the PR description wrong and misleading for anyone trying to reproduce the recipe externally — still worth fixing, but lower stakes than a CI-fatal mismatch.
Impact
If the script default is wrong, the entire minimaxm2.5-fp8-mi355x-vllm-eagle3 sweep (3 ISL/OSL combos × 2 TP/EP points × multiple concurrency levels) fails before serving begins. If only the PR description is wrong, the documentation/changelog still misleads users about where to fetch the draft model.
How to fix
Confirm which Hugging Face repo actually hosts the Eagle3 weights and reconcile all three references:
- If the canonical repo is
thoughtworks/MiniMax-M2.5-Eagle3: change the default on line 22 ofbenchmarks/single_node/minimaxm2.5_fp8_mi355x_eagle3.shtothoughtworks/MiniMax-M2.5-Eagle3, and update the changelogDraft model:line inperf-changelog.yamlto match. - If the canonical repo is
MiniMaxAI/MiniMax-M2.5-Eagle3: edit the PR description to drop thethoughtworks/URL and replace it with the MiniMaxAI/ path. (The script and changelog are already consistent in this case.)
|
|
||
| - config-keys: | ||
| - minimaxm2.5-fp8-mi355x-vllm-eagle3 | ||
| description: | ||
| - "Add MiniMax-M2.5 FP8 vLLM Eagle3 speculative decoding benchmark for MI355X" | ||
| - "Model: MiniMaxAI/MiniMax-M2.5 from Hugging Face Hub" | ||
| - "Draft model: MiniMaxAI/MiniMax-M2.5-Eagle3 from Hugging Face Hub" | ||
| - "Recipe uses TP/EP={4/4, 8/8}, num_speculative_tokens=3, draft_tensor_parallel_size=1" | ||
| - "Sweep covers ISL/OSL 1k/1k, 1k/8k, and 8k/1k with concurrency 4-64" | ||
| - "Image pinned to vllm/vllm-openai-rocm:nightly-4eafc729285e459a5fc96efd6f7b313b155cad48 for Eagle3 support" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry's pr-link still has the TBD placeholder (https://github.com/SemiAnalysisAI/InferenceX/pull/TBD) — please replace it with /pull/1234 before merge so the changelog stays a navigable record. Every other entry in this file uses a real PR number; this is the lone outlier.
Extended reasoning...
What the bug is
The new entry appended to perf-changelog.yaml (lines 2034-2044) for minimaxm2.5-fp8-mi355x-vllm-eagle3 ends with:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBDThe TBD is a literal placeholder string that was presumably written before the PR number was known. The PR is now #1234 (per the PR metadata), so the link resolves to a non-existent URL.
How it manifests
https://github.com/SemiAnalysisAI/InferenceX/pull/TBD is not a valid GitHub pull URL — it 404s. Anyone reading the changelog and following the link to find the originating PR (the whole point of the field) cannot get back to this PR. Every other one of the ~150 entries in perf-changelog.yaml already uses a real PR number (/pull/95, /pull/110, …, /pull/1230), so this is a unique outlier.
Why existing validation doesn't catch it
utils/matrix_logic/validation.py defines ChangelogEntry with pr_link: str = Field(alias="pr-link") — there is no URL-format check, no regex constraint that the trailing path component be numeric, and no cross-check against the PR number. So the YAML still parses cleanly and extra='forbid' doesn't fire (the field is present and is a string). The bug is purely a documentation/link-correctness issue, not a schema violation, which is why nothing fails CI.
Impact
Low. The only downstream effect is that the changelog field is wrong: it points at a non-existent URL. Nothing functional breaks (no benchmarks rely on pr-link), but the changelog is the canonical record for performance-affecting changes, and this entry will be permanently un-navigable once it lands on main. If any tooling parses pr-link and expects a numeric PR id at the tail, it will also break on TBD.
Step-by-step proof
- Look at the diff hunk for
perf-changelog.yaml— the appended entry's last line is literallypr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD. - Visit that URL: GitHub returns 404 because there is no PR "TBD".
- Compare with any prior entry in the file (e.g., line 2031:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1157) — every one of them is a valid/pull/<integer>URL. - Per the PR metadata, this PR's number is
1234, so the correct value ishttps://github.com/SemiAnalysisAI/InferenceX/pull/1234.
How to fix
One-line change at line 2044:
- pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD
+ pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1234Severity
nit — trivially fixable, doesn't break validation or any benchmark, but should be fixed before merge so the changelog remains a clean, navigable record.
|
@larryli2-amd going to pause ur sweep right now since the mi355 queue is incredibly long |

Summary
Add a MiniMax-M2.5 FP8 vLLM Eagle3 speculative decoding benchmark for AMD MI355X.
This PR adds:
minimaxm2.5-fp8-mi355x-vllm-eagle31k/1k,1k/8k, and8k/1k{4/4, 8/8}with concurrency4-64spec-decoding: eagle3support in matrix validation and the MI355X launcherThe benchmark uses
num_speculative_tokens=3anddraft_tensor_parallel_size=1.Test Plan
amd-master.yamlandperf-changelog.yaml1k/1k,1k/8k, and8k/1kwith TP/EP{4/4, 8/8}