Skip to content

[NV] update minimaxm2.5 fp4 b300 vllm#1107

Merged
hshrivastava-droid merged 6 commits intomainfrom
minimaxm2.5-fp4-b300-vllm-v2
Apr 21, 2026
Merged

[NV] update minimaxm2.5 fp4 b300 vllm#1107
hshrivastava-droid merged 6 commits intomainfrom
minimaxm2.5-fp4-b300-vllm-v2

Conversation

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

@hshrivastava-droid hshrivastava-droid commented Apr 21, 2026

Summary

Update MiniMax-M2.5 FP4 B300 vLLM benchmark configuration with optimized search space and precision settings.

Changes

Benchmark script (benchmarks/single_node/minimaxm2.5_fp4_b300.sh)

  • Add VLLM_FLOAT32_MATMUL_PRECISION=high environment variable to improve numerical precision during matmul operations

Search space (nvidia-master.yamlminimaxm2.5-fp4-b300-vllm)

ISL=1024, OSL=1024:

  • Expand TP=1 concurrency range from 4-44-8
  • Replace TP=2 non-EP configs with TP=2/EP=2 dp-attn config at conc 256-2048
  • Narrow TP=2/EP=2 non-dp-attn to conc 128 only
  • Narrow TP=4 non-EP from 4-5128-8
  • Expand TP=8 concurrency from 4-44-8
  • Remove standalone TP=2 (no EP) and TP=2/EP=2/dp-attn at conc 512 entries

ISL=8192, OSL=1024:

  • Consolidate TP=1 range from two entries (4-32, 256-512) → 4-256 + 1024
  • Replace TP=2 and TP=2/EP=2 entries with TP=2/EP=2/dp-attn at conc 512
  • Narrow TP=4 from 4-5124-8

Changelog (perf-changelog.yaml)

  • Add entry documenting the precision and search space changes

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread perf-changelog.yaml
Comment on lines +1651 to +1654
- minimaxm2.5-fp4-b300-vllm
description:
- "Add VLLM_FLOAT32_MATMUL_PRECISION=high, update search space concurrency ranges"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1107
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry uses an unresolved placeholder 'pull/XXX' in the pr-link field; since this PR has been assigned #1107, the link should be updated to 'pull/1107' before merging so the changelog entry can be traced back to this PR.

Extended reasoning...

The new changelog entry added by this PR at perf-changelog.yaml:1654 contains 'pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX'. The PR being reviewed is #1107, so the correct link is '#1107'. This means the changelog entry, which is meant to serve as a traceable record of what changed and when, cannot be linked back to this PR after merge.

The specific code path is simple: the perf-changelog.yaml file is a human-maintained audit trail mapping config-keys to the PR that introduced each change. The entry for 'minimaxm2.5-fp4-b300-vllm' documents 'Add VLLM_FLOAT32_MATMUL_PRECISION=high, update search space concurrency ranges' but the pr-link placeholder prevents future readers from finding the originating PR.

The refutation argument notes that XXX/XXXX placeholders appear in approximately 22 other entries in the file, making it an 'accepted pattern.' However, accepted pattern does not mean correct behavior — those entries are also missing proper PR links and represent the same documentation debt. The PR being [WIP] further explains why the link was left as a placeholder during development, but the expectation should be that it is resolved before merging (i.e., right now, since the PR number is already known).

The impact is limited to documentation quality. No runtime behavior, benchmark configurations, or benchmark scripts are affected. However, an XXX link in the merged changelog permanently obscures which PR introduced this change, making historical archaeology harder for maintainers. The changelog's value is precisely in its traceability.

The fix is a one-line change: replace 'pull/XXX' with 'pull/1107' in the last line of the new entry at perf-changelog.yaml:1654. This is a nit-level issue — it should be fixed before merging but does not block functionality.

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@claude - add PR description

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 21, 2026

Claude finished @hshrivastava-droid's task in 36s —— View job


Adding PR Description

  • Gather context on PR changes
  • Write and update PR description

PR description has been added with a detailed summary of all three changes:

  1. VLLM_FLOAT32_MATMUL_PRECISION=high env var added to the benchmark script
  2. Search space updates for both ISL=1024 and ISL=8192 configurations — concurrency ranges narrowed/expanded, some TP/EP combos consolidated
  3. Changelog entry added to perf-changelog.yaml

@hshrivastava-droid hshrivastava-droid changed the title [WIP][NV] update minimaxm2.5 fp4 b300 vllm [NV] update minimaxm2.5 fp4 b300 vllm Apr 21, 2026
SERVER_LOG=/workspace/server.log
PORT=${PORT:-8888}

export VLLM_FLOAT32_MATMUL_PRECISION=high
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u update the vllm recipe?

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

hshrivastava-droid commented Apr 21, 2026

vllm recipe- vllm-project/recipes#353

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thank you for this PR!

@hshrivastava-droid hshrivastava-droid merged commit 633fd23 into main Apr 21, 2026
4 checks passed
@hshrivastava-droid hshrivastava-droid deleted the minimaxm2.5-fp4-b300-vllm-v2 branch April 21, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

6 participants