Larryli2/minimax m2.5 fp8 eagle3 by larryli2-amd · Pull Request #1234 · SemiAnalysisAI/InferenceX

larryli2-amd · 2026-04-30T08:08:10Z

Summary

Add a MiniMax-M2.5 FP8 vLLM Eagle3 speculative decoding benchmark for AMD MI355X.

This PR adds:

A new MI355X benchmark script for https://huggingface.co/MiniMaxAI/MiniMax-M2.5 with Eagle3 draft model https://huggingface.co/thoughtworks/MiniMax-M2.5-Eagle3/tree/main
A new AMD master config entry: minimaxm2.5-fp8-mi355x-vllm-eagle3
Sweep coverage for 1k/1k, 1k/8k, and 8k/1k
TP/EP search points {4/4, 8/8} with concurrency 4-64
spec-decoding: eagle3 support in matrix validation and the MI355X launcher
A perf changelog entry for the new benchmark config

The benchmark uses num_speculative_tokens=3 and draft_tensor_parallel_size=1.

Test Plan

Ran shell syntax checks for the new benchmark script and MI355X launcher
Ran Python syntax checks for matrix validation files
Parsed amd-master.yaml and perf-changelog.yaml
Verified the new config expands to 1k/1k, 1k/8k, and 8k/1k with TP/EP {4/4, 8/8}
Checked IDE lints for modified files

github-actions · 2026-04-30T08:08:20Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-30T08:08:20Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-30T08:15:27Z

+SPEC_DRAFT_MODEL=${SPEC_DRAFT_MODEL:-MiniMaxAI/MiniMax-M2.5-Eagle3}
+SPEC_NUM_TOKENS=${SPEC_NUM_TOKENS:-3}
+SPEC_DRAFT_TP=${SPEC_DRAFT_TP:-1}
+
+hf download "$SPEC_DRAFT_MODEL"


🔴 The PR description points to the Eagle3 draft model at huggingface.co/thoughtworks/MiniMax-M2.5-Eagle3 (thoughtworks org), but the script defaults SPEC_DRAFT_MODEL to MiniMaxAI/MiniMax-M2.5-Eagle3 (MiniMaxAI org), and perf-changelog.yaml lists the same MiniMaxAI/ path — these can't both be correct. Since the script unconditionally runs hf download "$SPEC_DRAFT_MODEL" before vllm-serve, if the weights are actually only published under thoughtworks/ the benchmark will 404 at startup; please reconcile the PR description, the script default, and the changelog so they all reference the real HF repo.

Extended reasoning...

What is the bug?

The Eagle3 draft model identifier is inconsistent across three places that should agree:

PR description: "Eagle3 draft model https://huggingface.co/thoughtworks/MiniMax-M2.5-Eagle3/tree/main" — i.e. the thoughtworks org.

benchmarks/single_node/minimaxm2.5_fp8_mi355x_eagle3.sh:22: SPEC_DRAFT_MODEL=${SPEC_DRAFT_MODEL:-MiniMaxAI/MiniMax-M2.5-Eagle3} — the MiniMaxAI org.

perf-changelog.yaml entry for minimaxm2.5-fp8-mi355x-vllm-eagle3: "Draft model: MiniMaxAI/MiniMax-M2.5-Eagle3 from Hugging Face Hub" — also the MiniMaxAI org.

Why the existing code does not save us

The benchmark script unconditionally executes hf download "$SPEC_DRAFT_MODEL" (line 26) before vllm serve is invoked, and SPEC_DRAFT_MODEL is also baked into the --speculative-config JSON passed to vLLM. The MI355X launcher (runners/launch_mi355x-amds.sh) does not export SPEC_DRAFT_MODEL anywhere, so the script default is what actually executes in CI. There is no override path that would substitute the correct namespace at runtime.

Step-by-step proof of failure (assuming the PR description is correct and only thoughtworks/MiniMax-M2.5-Eagle3 exists)

CI selects the new minimaxm2.5-fp8-mi355x-vllm-eagle3 config from amd-master.yaml.

The launcher invokes benchmarks/single_node/minimaxm2.5_fp8_mi355x_eagle3.sh (resolved via the new SPEC_SUFFIX='_eagle3' case in runners/launch_mi355x-amds.sh).

Inside the script, SPEC_DRAFT_MODEL is unset by the caller, so it defaults to MiniMaxAI/MiniMax-M2.5-Eagle3 (line 22).

Line 26 runs hf download MiniMaxAI/MiniMax-M2.5-Eagle3. Since (per the PR description) the weights live under thoughtworks/, this resolves to a non-existent repo and returns a 401/404 from Hugging Face Hub.

hf download exits non-zero; the script aborts before vllm serve is launched, killing the entire Eagle3 sweep.

The symmetric case (weights actually published under MiniMaxAI/) makes the PR description wrong and misleading for anyone trying to reproduce the recipe externally — still worth fixing, but lower stakes than a CI-fatal mismatch.

Impact

If the script default is wrong, the entire minimaxm2.5-fp8-mi355x-vllm-eagle3 sweep (3 ISL/OSL combos × 2 TP/EP points × multiple concurrency levels) fails before serving begins. If only the PR description is wrong, the documentation/changelog still misleads users about where to fetch the draft model.

How to fix

Confirm which Hugging Face repo actually hosts the Eagle3 weights and reconcile all three references:

If the canonical repo is thoughtworks/MiniMax-M2.5-Eagle3: change the default on line 22 of benchmarks/single_node/minimaxm2.5_fp8_mi355x_eagle3.sh to thoughtworks/MiniMax-M2.5-Eagle3, and update the changelog Draft model: line in perf-changelog.yaml to match.

If the canonical repo is MiniMaxAI/MiniMax-M2.5-Eagle3: edit the PR description to drop the thoughtworks/ URL and replace it with the MiniMaxAI/ path. (The script and changelog are already consistent in this case.)

claude · 2026-04-30T08:15:27Z

+
+- config-keys:
+    - minimaxm2.5-fp8-mi355x-vllm-eagle3
+  description:
+    - "Add MiniMax-M2.5 FP8 vLLM Eagle3 speculative decoding benchmark for MI355X"
+    - "Model: MiniMaxAI/MiniMax-M2.5 from Hugging Face Hub"
+    - "Draft model: MiniMaxAI/MiniMax-M2.5-Eagle3 from Hugging Face Hub"
+    - "Recipe uses TP/EP={4/4, 8/8}, num_speculative_tokens=3, draft_tensor_parallel_size=1"
+    - "Sweep covers ISL/OSL 1k/1k, 1k/8k, and 8k/1k with concurrency 4-64"
+    - "Image pinned to vllm/vllm-openai-rocm:nightly-4eafc729285e459a5fc96efd6f7b313b155cad48 for Eagle3 support"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD


🟡 The new perf-changelog.yaml entry's pr-link still has the TBD placeholder (https://github.com/SemiAnalysisAI/InferenceX/pull/TBD) — please replace it with /pull/1234 before merge so the changelog stays a navigable record. Every other entry in this file uses a real PR number; this is the lone outlier.

Extended reasoning...

What the bug is

The new entry appended to perf-changelog.yaml (lines 2034-2044) for minimaxm2.5-fp8-mi355x-vllm-eagle3 ends with:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD

The TBD is a literal placeholder string that was presumably written before the PR number was known. The PR is now #1234 (per the PR metadata), so the link resolves to a non-existent URL.

How it manifests

https://github.com/SemiAnalysisAI/InferenceX/pull/TBD is not a valid GitHub pull URL — it 404s. Anyone reading the changelog and following the link to find the originating PR (the whole point of the field) cannot get back to this PR. Every other one of the ~150 entries in perf-changelog.yaml already uses a real PR number (/pull/95, /pull/110, …, /pull/1230), so this is a unique outlier.

Why existing validation doesn't catch it

utils/matrix_logic/validation.py defines ChangelogEntry with pr_link: str = Field(alias="pr-link") — there is no URL-format check, no regex constraint that the trailing path component be numeric, and no cross-check against the PR number. So the YAML still parses cleanly and extra='forbid' doesn't fire (the field is present and is a string). The bug is purely a documentation/link-correctness issue, not a schema violation, which is why nothing fails CI.

Impact

Low. The only downstream effect is that the changelog field is wrong: it points at a non-existent URL. Nothing functional breaks (no benchmarks rely on pr-link), but the changelog is the canonical record for performance-affecting changes, and this entry will be permanently un-navigable once it lands on main. If any tooling parses pr-link and expects a numeric PR id at the tail, it will also break on TBD.

Step-by-step proof

Look at the diff hunk for perf-changelog.yaml — the appended entry's last line is literally pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD.

Visit that URL: GitHub returns 404 because there is no PR "TBD".

Compare with any prior entry in the file (e.g., line 2031: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1157) — every one of them is a valid /pull/<integer> URL.

Per the PR metadata, this PR's number is 1234, so the correct value is https://github.com/SemiAnalysisAI/InferenceX/pull/1234.

How to fix

One-line change at line 2044:

- pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD + pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1234

Severity

nit — trivially fixable, doesn't break validation or any benchmark, but should be fixed before merge so the changelog remains a clean, navigable record.

functionstackx · 2026-05-01T02:19:46Z

@larryli2-amd going to pause ur sweep right now since the mi355 queue is incredibly long

larryli2-amd added 7 commits April 30, 2026 16:00

Update amd-master.yaml

d955786

Update README.md

b4da9de

Create minimaxm2.5_fp8_mi355x_eagle3.sh

9e95d1d

Update launch_mi355x-amds.sh

7ce71d1

Update test_validation.py

348f70a

Update validation.py

740cab0

Update perf-changelog.yaml

df3b1fe

larryli2-amd self-assigned this Apr 30, 2026

larryli2-amd requested a review from a team April 30, 2026 08:08

larryli2-amd requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners April 30, 2026 08:08

github-project-automation Bot added this to InferenceMAX Board Apr 30, 2026

Update perf-changelog.yaml

c871601

larryli2-amd marked this pull request as draft April 30, 2026 08:09

claude Bot reviewed Apr 30, 2026

View reviewed changes

larryli2-amd added 2 commits April 30, 2026 16:24

Update perf-changelog.yaml

140e386

Update perf-changelog.yaml

ad01959

Update minimaxm2.5_fp8_mi355x_eagle3.sh

63cd9a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Larryli2/minimax m2.5 fp8 eagle3#1234

Larryli2/minimax m2.5 fp8 eagle3#1234
larryli2-amd wants to merge 11 commits intomainfrom
larryli2/minimax-m2.5-fp8-eagle3

larryli2-amd commented Apr 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

claude Bot Apr 30, 2026

Uh oh!

claude Bot Apr 30, 2026

Uh oh!

functionstackx commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

larryli2-amd commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

claude Bot Apr 30, 2026

Choose a reason for hiding this comment

What is the bug?

Why the existing code does not save us

Step-by-step proof of failure (assuming the PR description is correct and only thoughtworks/MiniMax-M2.5-Eagle3 exists)

Impact

How to fix

Uh oh!

claude Bot Apr 30, 2026

Choose a reason for hiding this comment

What the bug is

How it manifests

Why existing validation doesn't catch it

Impact

Step-by-step proof

How to fix

Severity

Uh oh!

functionstackx commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

larryli2-amd commented Apr 30, 2026 •

edited

Loading

Step-by-step proof of failure (assuming the PR description is correct and only `thoughtworks/MiniMax-M2.5-Eagle3` exists)