Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3513,9 +3513,8 @@ minimaxm2.5-fp8-b200-vllm:
- isl: 1024
osl: 1024
search-space:
- { tp: 2, conc-start: 4, conc-end: 512 }
- { tp: 4, conc-start: 4, conc-end: 512 }
- { tp: 2, ep: 2, conc-start: 512, conc-end: 512 }
- { tp: 4, conc-start: 4, conc-end: 128 }
- { tp: 4, ep: 4, conc-start: 256, conc-end: 512 }
- isl: 8192
osl: 1024
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/single_node/minimaxm2.5_fp8_b200.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ hf download "$MODEL"
SERVER_LOG=/workspace/server.log
PORT=${PORT:-8888}

export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
export VLLM_FLOAT32_MATMUL_PRECISION=high
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The B300 benchmark script (benchmarks/single_node/minimaxm2.5_fp8_b300.sh) was not updated to match the env var change made to the B200 script in this PR. The B300 script explicitly documents that it "reuses the existing MiniMax-M2.5 FP8 B200 vLLM recipe as-is", so it should be updated in the same PR to replace VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with VLLM_FLOAT32_MATMUL_PRECISION=high.

Extended reasoning...

What the bug is and how it manifests

This PR updates benchmarks/single_node/minimaxm2.5_fp8_b200.sh (line 27) to replace export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with export VLLM_FLOAT32_MATMUL_PRECISION=high. However, the companion B300 script (benchmarks/single_node/minimaxm2.5_fp8_b300.sh) was not updated and still exports VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl at line 31. After this PR merges, the two scripts will be out of sync with different environment configurations.

The specific code path that triggers it

The B300 script contains an explicit design comment at lines 3–5: "this script reuses the existing MiniMax-M2.5 FP8 B200 vLLM recipe as-is until B300-specific tuning is available." This is corroborated by the perf-changelog.yaml entry for minimaxm2.5-fp8-b300-vllm which also states it reuses the B200 recipe as-is. The deliberate design intent is for B300 to mirror B200 until independent tuning is done.

Why existing code doesn't prevent it

There is no automated mechanism to enforce parity between the B200 and B300 scripts. The only enforcement is the human convention expressed in the B300 comment, which was overlooked in this PR when only the B200 script was modified.

What the impact would be

After this PR merges, running benchmarks on B300 will use the old VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl setting, while B200 uses the new VLLM_FLOAT32_MATMUL_PRECISION=high setting. Since the env var was presumably changed on B200 for correctness or performance reasons (possibly mnnvl was incorrect or suboptimal on this hardware), B300 will be benchmarked under suboptimal or incorrect conditions — producing results that are not comparable to B200 and do not reflect the intended configuration.

How to fix it

In benchmarks/single_node/minimaxm2.5_fp8_b300.sh at line 31, replace:

export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl

with:

export VLLM_FLOAT32_MATMUL_PRECISION=high

Step-by-step proof

  1. Before this PR, B200 script (line 27) had: export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
  2. Before this PR, B300 script (line 31) had: export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl — matching B200 as intended
  3. This PR changes B200 line 27 to: export VLLM_FLOAT32_MATMUL_PRECISION=high
  4. This PR does NOT change the B300 script
  5. After this PR: B200 uses VLLM_FLOAT32_MATMUL_PRECISION=high, B300 uses VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
  6. B300's own comment says it should mirror B200 — contradiction.


if [ "$EP_SIZE" -gt 1 ]; then
EP=" --enable-expert-parallel"
Expand Down
6 changes: 6 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1690,3 +1690,9 @@
description:
- "Add VLLM_FLOAT32_MATMUL_PRECISION=high, update search space concurrency ranges"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1107

- config-keys:
- minimaxm2.5-fp8-b200-vllm
description:
- "Add VLLM_FLOAT32_MATMUL_PRECISION=high, remove VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1068
Loading