[NV] update minimaxm2.5 fp4 b300 vllm by hshrivastava-droid · Pull Request #1107 · SemiAnalysisAI/InferenceX

hshrivastava-droid · 2026-04-21T02:13:06Z

Summary

Update MiniMax-M2.5 FP4 B300 vLLM benchmark configuration with optimized search space and precision settings.

Changes

Benchmark script (benchmarks/single_node/minimaxm2.5_fp4_b300.sh)

Add VLLM_FLOAT32_MATMUL_PRECISION=high environment variable to improve numerical precision during matmul operations

Search space (nvidia-master.yaml — minimaxm2.5-fp4-b300-vllm)

ISL=1024, OSL=1024:

Expand TP=1 concurrency range from 4-4 → 4-8
Replace TP=2 non-EP configs with TP=2/EP=2 dp-attn config at conc 256-2048
Narrow TP=2/EP=2 non-dp-attn to conc 128 only
Narrow TP=4 non-EP from 4-512 → 8-8
Expand TP=8 concurrency from 4-4 → 4-8
Remove standalone TP=2 (no EP) and TP=2/EP=2/dp-attn at conc 512 entries

ISL=8192, OSL=1024:

Consolidate TP=1 range from two entries (4-32, 256-512) → 4-256 + 1024
Replace TP=2 and TP=2/EP=2 entries with TP=2/EP=2/dp-attn at conc 512
Narrow TP=4 from 4-512 → 4-8

Changelog (perf-changelog.yaml)

Add entry documenting the precision and search space changes

github-actions · 2026-04-21T02:13:14Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-21T02:20:55Z

+    - minimaxm2.5-fp4-b300-vllm
+  description:
+    - "Add VLLM_FLOAT32_MATMUL_PRECISION=high, update search space concurrency ranges"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1107


🟡 The new perf-changelog.yaml entry uses an unresolved placeholder 'pull/XXX' in the pr-link field; since this PR has been assigned #1107, the link should be updated to 'pull/1107' before merging so the changelog entry can be traced back to this PR.

Extended reasoning...

The new changelog entry added by this PR at perf-changelog.yaml:1654 contains 'pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX'. The PR being reviewed is #1107, so the correct link is '#1107'. This means the changelog entry, which is meant to serve as a traceable record of what changed and when, cannot be linked back to this PR after merge.

The specific code path is simple: the perf-changelog.yaml file is a human-maintained audit trail mapping config-keys to the PR that introduced each change. The entry for 'minimaxm2.5-fp4-b300-vllm' documents 'Add VLLM_FLOAT32_MATMUL_PRECISION=high, update search space concurrency ranges' but the pr-link placeholder prevents future readers from finding the originating PR.

The refutation argument notes that XXX/XXXX placeholders appear in approximately 22 other entries in the file, making it an 'accepted pattern.' However, accepted pattern does not mean correct behavior — those entries are also missing proper PR links and represent the same documentation debt. The PR being [WIP] further explains why the link was left as a placeholder during development, but the expectation should be that it is resolved before merging (i.e., right now, since the PR number is already known).

The impact is limited to documentation quality. No runtime behavior, benchmark configurations, or benchmark scripts are affected. However, an XXX link in the merged changelog permanently obscures which PR introduced this change, making historical archaeology harder for maintainers. The changelog's value is precisely in its traceability.

The fix is a one-line change: replace 'pull/XXX' with 'pull/1107' in the last line of the new entry at perf-changelog.yaml:1654. This is a nit-level issue — it should be fixed before merging but does not block functionality.

hshrivastava-droid · 2026-04-21T18:24:08Z

@claude - add PR description

Klaud-Cold · 2026-04-21T18:24:39Z

Claude finished @hshrivastava-droid's task in 36s —— View job

Adding PR Description

Gather context on PR changes
Write and update PR description

PR description has been added with a detailed summary of all three changes:

VLLM_FLOAT32_MATMUL_PRECISION=high env var added to the benchmark script
Search space updates for both ISL=1024 and ISL=8192 configurations — concurrency ranges narrowed/expanded, some TP/EP combos consolidated
Changelog entry added to perf-changelog.yaml

functionstackx · 2026-04-21T18:34:23Z

 SERVER_LOG=/workspace/server.log
 PORT=${PORT:-8888}

+export VLLM_FLOAT32_MATMUL_PRECISION=high


can u update the vllm recipe?

hshrivastava-droid · 2026-04-21T20:23:19Z

vllm recipe- vllm-project/recipes#353

functionstackx

lgtm! thank you for this PR!

update vllm

fc652ef

hshrivastava-droid requested a review from a team April 21, 2026 02:13

hshrivastava-droid requested review from jgangani and kedarpotdar-nv as code owners April 21, 2026 02:13

github-project-automation Bot added this to InferenceMAX Board Apr 21, 2026

update Pr number

7f12a85

hshrivastava-droid added NVIDIA sweep-enabled labels Apr 21, 2026

claude Bot reviewed Apr 21, 2026

View reviewed changes

Merge branch 'main' into minimaxm2.5-fp4-b300-vllm-v2

14bf54d

hshrivastava-droid changed the title ~~[WIP][NV] update minimaxm2.5 fp4 b300 vllm~~ [NV] update minimaxm2.5 fp4 b300 vllm Apr 21, 2026

functionstackx requested changes Apr 21, 2026

View reviewed changes

faradawn mentioned this pull request Apr 21, 2026

feat(MiniMax-M2.5): add VLLM_FLOAT32_MATMUL_PRECISION=high for Blackwell (B200/B300 FP8+FP4) vllm-project/recipes#353

Open

hshrivastava-droid requested a review from functionstackx April 21, 2026 20:23

functionstackx approved these changes Apr 21, 2026

View reviewed changes

hshrivastava-droid added 2 commits April 21, 2026 13:53

Merge branch 'main' into minimaxm2.5-fp4-b300-vllm-v2

293143f

update conc

50a129c

jgangani approved these changes Apr 21, 2026

View reviewed changes

kedarpotdar-nv approved these changes Apr 21, 2026

View reviewed changes

Merge branch 'main' into minimaxm2.5-fp4-b300-vllm-v2

fdd33da

hshrivastava-droid merged commit 633fd23 into main Apr 21, 2026
4 checks passed

hshrivastava-droid deleted the minimaxm2.5-fp4-b300-vllm-v2 branch April 21, 2026 22:46

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 21, 2026

claude Bot mentioned this pull request Apr 24, 2026

Add dsv4-fp8-h200-sglang single-node config #1136

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] update minimaxm2.5 fp4 b300 vllm#1107

[NV] update minimaxm2.5 fp4 b300 vllm#1107
hshrivastava-droid merged 6 commits intomainfrom
minimaxm2.5-fp4-b300-vllm-v2

hshrivastava-droid commented Apr 21, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

hshrivastava-droid commented Apr 21, 2026

Uh oh!

Klaud-Cold commented Apr 21, 2026 •

edited

Loading

Uh oh!

functionstackx Apr 21, 2026

Uh oh!

hshrivastava-droid commented Apr 21, 2026 •

edited

Loading

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

hshrivastava-droid commented Apr 21, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

hshrivastava-droid commented Apr 21, 2026

Uh oh!

Klaud-Cold commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

functionstackx Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

hshrivastava-droid commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hshrivastava-droid commented Apr 21, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Apr 21, 2026 •

edited

Loading

hshrivastava-droid commented Apr 21, 2026 •

edited

Loading