[NV] update minimaxm2.5 fp4 b300 vllm#1107
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| - minimaxm2.5-fp4-b300-vllm | ||
| description: | ||
| - "Add VLLM_FLOAT32_MATMUL_PRECISION=high, update search space concurrency ranges" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1107 |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry uses an unresolved placeholder 'pull/XXX' in the pr-link field; since this PR has been assigned #1107, the link should be updated to 'pull/1107' before merging so the changelog entry can be traced back to this PR.
Extended reasoning...
The new changelog entry added by this PR at perf-changelog.yaml:1654 contains 'pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX'. The PR being reviewed is #1107, so the correct link is '#1107'. This means the changelog entry, which is meant to serve as a traceable record of what changed and when, cannot be linked back to this PR after merge.
The specific code path is simple: the perf-changelog.yaml file is a human-maintained audit trail mapping config-keys to the PR that introduced each change. The entry for 'minimaxm2.5-fp4-b300-vllm' documents 'Add VLLM_FLOAT32_MATMUL_PRECISION=high, update search space concurrency ranges' but the pr-link placeholder prevents future readers from finding the originating PR.
The refutation argument notes that XXX/XXXX placeholders appear in approximately 22 other entries in the file, making it an 'accepted pattern.' However, accepted pattern does not mean correct behavior — those entries are also missing proper PR links and represent the same documentation debt. The PR being [WIP] further explains why the link was left as a placeholder during development, but the expectation should be that it is resolved before merging (i.e., right now, since the PR number is already known).
The impact is limited to documentation quality. No runtime behavior, benchmark configurations, or benchmark scripts are affected. However, an XXX link in the merged changelog permanently obscures which PR introduced this change, making historical archaeology harder for maintainers. The changelog's value is precisely in its traceability.
The fix is a one-line change: replace 'pull/XXX' with 'pull/1107' in the last line of the new entry at perf-changelog.yaml:1654. This is a nit-level issue — it should be fixed before merging but does not block functionality.
|
@claude - add PR description |
|
Claude finished @hshrivastava-droid's task in 36s —— View job Adding PR Description
PR description has been added with a detailed summary of all three changes:
|
| SERVER_LOG=/workspace/server.log | ||
| PORT=${PORT:-8888} | ||
|
|
||
| export VLLM_FLOAT32_MATMUL_PRECISION=high |
There was a problem hiding this comment.
can u update the vllm recipe?
|
vllm recipe- vllm-project/recipes#353 |
functionstackx
left a comment
There was a problem hiding this comment.
lgtm! thank you for this PR!
Summary
Update MiniMax-M2.5 FP4 B300 vLLM benchmark configuration with optimized search space and precision settings.
Changes
Benchmark script (
benchmarks/single_node/minimaxm2.5_fp4_b300.sh)VLLM_FLOAT32_MATMUL_PRECISION=highenvironment variable to improve numerical precision during matmul operationsSearch space (
nvidia-master.yaml—minimaxm2.5-fp4-b300-vllm)ISL=1024, OSL=1024:
4-4→4-8conc 256-2048conc 128only4-512→8-84-4→4-8conc 512entriesISL=8192, OSL=1024:
4-32,256-512) →4-256+1024conc 5124-512→4-8Changelog (
perf-changelog.yaml)