Skip to content

Larryli2/minimax m2.5 fp8 eagle3#1234

Draft
larryli2-amd wants to merge 11 commits intomainfrom
larryli2/minimax-m2.5-fp8-eagle3
Draft

Larryli2/minimax m2.5 fp8 eagle3#1234
larryli2-amd wants to merge 11 commits intomainfrom
larryli2/minimax-m2.5-fp8-eagle3

Conversation

@larryli2-amd
Copy link
Copy Markdown
Collaborator

@larryli2-amd larryli2-amd commented Apr 30, 2026

Summary

Add a MiniMax-M2.5 FP8 vLLM Eagle3 speculative decoding benchmark for AMD MI355X.

This PR adds:

The benchmark uses num_speculative_tokens=3 and draft_tensor_parallel_size=1.

Test Plan

  • Ran shell syntax checks for the new benchmark script and MI355X launcher
  • Ran Python syntax checks for matrix validation files
  • Parsed amd-master.yaml and perf-changelog.yaml
  • Verified the new config expands to 1k/1k, 1k/8k, and 8k/1k with TP/EP {4/4, 8/8}
  • Checked IDE lints for modified files

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@larryli2-amd larryli2-amd marked this pull request as draft April 30, 2026 08:09
Comment on lines +22 to +26
SPEC_DRAFT_MODEL=${SPEC_DRAFT_MODEL:-MiniMaxAI/MiniMax-M2.5-Eagle3}
SPEC_NUM_TOKENS=${SPEC_NUM_TOKENS:-3}
SPEC_DRAFT_TP=${SPEC_DRAFT_TP:-1}

hf download "$SPEC_DRAFT_MODEL"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The PR description points to the Eagle3 draft model at huggingface.co/thoughtworks/MiniMax-M2.5-Eagle3 (thoughtworks org), but the script defaults SPEC_DRAFT_MODEL to MiniMaxAI/MiniMax-M2.5-Eagle3 (MiniMaxAI org), and perf-changelog.yaml lists the same MiniMaxAI/ path — these can't both be correct. Since the script unconditionally runs hf download "$SPEC_DRAFT_MODEL" before vllm-serve, if the weights are actually only published under thoughtworks/ the benchmark will 404 at startup; please reconcile the PR description, the script default, and the changelog so they all reference the real HF repo.

Extended reasoning...

What is the bug?

The Eagle3 draft model identifier is inconsistent across three places that should agree:

  1. PR description: "Eagle3 draft model https://huggingface.co/thoughtworks/MiniMax-M2.5-Eagle3/tree/main" — i.e. the thoughtworks org.
  2. benchmarks/single_node/minimaxm2.5_fp8_mi355x_eagle3.sh:22: SPEC_DRAFT_MODEL=${SPEC_DRAFT_MODEL:-MiniMaxAI/MiniMax-M2.5-Eagle3} — the MiniMaxAI org.
  3. perf-changelog.yaml entry for minimaxm2.5-fp8-mi355x-vllm-eagle3: "Draft model: MiniMaxAI/MiniMax-M2.5-Eagle3 from Hugging Face Hub" — also the MiniMaxAI org.

Why the existing code does not save us

The benchmark script unconditionally executes hf download "$SPEC_DRAFT_MODEL" (line 26) before vllm serve is invoked, and SPEC_DRAFT_MODEL is also baked into the --speculative-config JSON passed to vLLM. The MI355X launcher (runners/launch_mi355x-amds.sh) does not export SPEC_DRAFT_MODEL anywhere, so the script default is what actually executes in CI. There is no override path that would substitute the correct namespace at runtime.

Step-by-step proof of failure (assuming the PR description is correct and only thoughtworks/MiniMax-M2.5-Eagle3 exists)

  1. CI selects the new minimaxm2.5-fp8-mi355x-vllm-eagle3 config from amd-master.yaml.
  2. The launcher invokes benchmarks/single_node/minimaxm2.5_fp8_mi355x_eagle3.sh (resolved via the new SPEC_SUFFIX='_eagle3' case in runners/launch_mi355x-amds.sh).
  3. Inside the script, SPEC_DRAFT_MODEL is unset by the caller, so it defaults to MiniMaxAI/MiniMax-M2.5-Eagle3 (line 22).
  4. Line 26 runs hf download MiniMaxAI/MiniMax-M2.5-Eagle3. Since (per the PR description) the weights live under thoughtworks/, this resolves to a non-existent repo and returns a 401/404 from Hugging Face Hub.
  5. hf download exits non-zero; the script aborts before vllm serve is launched, killing the entire Eagle3 sweep.

The symmetric case (weights actually published under MiniMaxAI/) makes the PR description wrong and misleading for anyone trying to reproduce the recipe externally — still worth fixing, but lower stakes than a CI-fatal mismatch.

Impact

If the script default is wrong, the entire minimaxm2.5-fp8-mi355x-vllm-eagle3 sweep (3 ISL/OSL combos × 2 TP/EP points × multiple concurrency levels) fails before serving begins. If only the PR description is wrong, the documentation/changelog still misleads users about where to fetch the draft model.

How to fix

Confirm which Hugging Face repo actually hosts the Eagle3 weights and reconcile all three references:

  • If the canonical repo is thoughtworks/MiniMax-M2.5-Eagle3: change the default on line 22 of benchmarks/single_node/minimaxm2.5_fp8_mi355x_eagle3.sh to thoughtworks/MiniMax-M2.5-Eagle3, and update the changelog Draft model: line in perf-changelog.yaml to match.
  • If the canonical repo is MiniMaxAI/MiniMax-M2.5-Eagle3: edit the PR description to drop the thoughtworks/ URL and replace it with the MiniMaxAI/ path. (The script and changelog are already consistent in this case.)

Comment thread perf-changelog.yaml Outdated
Comment on lines +2034 to +2044

- config-keys:
- minimaxm2.5-fp8-mi355x-vllm-eagle3
description:
- "Add MiniMax-M2.5 FP8 vLLM Eagle3 speculative decoding benchmark for MI355X"
- "Model: MiniMaxAI/MiniMax-M2.5 from Hugging Face Hub"
- "Draft model: MiniMaxAI/MiniMax-M2.5-Eagle3 from Hugging Face Hub"
- "Recipe uses TP/EP={4/4, 8/8}, num_speculative_tokens=3, draft_tensor_parallel_size=1"
- "Sweep covers ISL/OSL 1k/1k, 1k/8k, and 8k/1k with concurrency 4-64"
- "Image pinned to vllm/vllm-openai-rocm:nightly-4eafc729285e459a5fc96efd6f7b313b155cad48 for Eagle3 support"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry's pr-link still has the TBD placeholder (https://github.com/SemiAnalysisAI/InferenceX/pull/TBD) — please replace it with /pull/1234 before merge so the changelog stays a navigable record. Every other entry in this file uses a real PR number; this is the lone outlier.

Extended reasoning...

What the bug is

The new entry appended to perf-changelog.yaml (lines 2034-2044) for minimaxm2.5-fp8-mi355x-vllm-eagle3 ends with:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD

The TBD is a literal placeholder string that was presumably written before the PR number was known. The PR is now #1234 (per the PR metadata), so the link resolves to a non-existent URL.

How it manifests

https://github.com/SemiAnalysisAI/InferenceX/pull/TBD is not a valid GitHub pull URL — it 404s. Anyone reading the changelog and following the link to find the originating PR (the whole point of the field) cannot get back to this PR. Every other one of the ~150 entries in perf-changelog.yaml already uses a real PR number (/pull/95, /pull/110, …, /pull/1230), so this is a unique outlier.

Why existing validation doesn't catch it

utils/matrix_logic/validation.py defines ChangelogEntry with pr_link: str = Field(alias="pr-link") — there is no URL-format check, no regex constraint that the trailing path component be numeric, and no cross-check against the PR number. So the YAML still parses cleanly and extra='forbid' doesn't fire (the field is present and is a string). The bug is purely a documentation/link-correctness issue, not a schema violation, which is why nothing fails CI.

Impact

Low. The only downstream effect is that the changelog field is wrong: it points at a non-existent URL. Nothing functional breaks (no benchmarks rely on pr-link), but the changelog is the canonical record for performance-affecting changes, and this entry will be permanently un-navigable once it lands on main. If any tooling parses pr-link and expects a numeric PR id at the tail, it will also break on TBD.

Step-by-step proof

  1. Look at the diff hunk for perf-changelog.yaml — the appended entry's last line is literally pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD.
  2. Visit that URL: GitHub returns 404 because there is no PR "TBD".
  3. Compare with any prior entry in the file (e.g., line 2031: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1157) — every one of them is a valid /pull/<integer> URL.
  4. Per the PR metadata, this PR's number is 1234, so the correct value is https://github.com/SemiAnalysisAI/InferenceX/pull/1234.

How to fix

One-line change at line 2044:

-  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1234

Severity

nit — trivially fixable, doesn't break validation or any benchmark, but should be fixed before merge so the changelog remains a clean, navigable record.

@functionstackx
Copy link
Copy Markdown
Contributor

@larryli2-amd going to pause ur sweep right now since the mi355 queue is incredibly long
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants