Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 9 additions & 12 deletions .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3593,21 +3593,18 @@ minimaxm2.5-fp4-b300-vllm:
- isl: 1024
osl: 1024
search-space:
- { tp: 1, conc-start: 4, conc-end: 4 }
- { tp: 2, conc-start: 4, conc-end: 512 }
- { tp: 2, ep: 2, conc-start: 128, conc-end: 256 }
- { tp: 2, ep: 2, dp-attn: true, conc-start: 512, conc-end: 512 }
- { tp: 4, conc-start: 4, conc-end: 512 }
- { tp: 4, ep: 4, conc-start: 32, conc-end: 128 }
- { tp: 8, conc-start: 4, conc-end: 4 }
- { tp: 1, conc-start: 4, conc-end: 8 }
- { tp: 2, ep: 2, conc-start: 128, conc-end: 128 }
- { tp: 2, ep: 2, dp-attn: true, conc-start: 256, conc-end: 2048 }
- { tp: 4, conc-start: 8, conc-end: 8 }
- { tp: 4, ep: 4, conc-start: 64, conc-end: 128 }
- { tp: 8, conc-start: 4, conc-end: 8 }
- isl: 8192
osl: 1024
search-space:
- { tp: 1, conc-start: 4, conc-end: 32 }
- { tp: 1, conc-start: 256, conc-end: 512 }
- { tp: 2, conc-start: 4, conc-end: 512 }
- { tp: 2, ep: 2, conc-start: 128, conc-end: 512 }
- { tp: 4, conc-start: 4, conc-end: 512 }
- { tp: 1, conc-start: 4, conc-end: 256 }
- { tp: 2, ep: 2, dp-attn: true, conc-start: 512, conc-end: 512 }
- { tp: 4, conc-start: 4, conc-end: 8 }
- { tp: 8, conc-start: 4, conc-end: 4 }

gptoss-fp4-h100-vllm:
Expand Down
2 changes: 2 additions & 0 deletions benchmarks/single_node/minimaxm2.5_fp4_b300.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ hf download "$MODEL"
SERVER_LOG=/workspace/server.log
PORT=${PORT:-8888}

export VLLM_FLOAT32_MATMUL_PRECISION=high
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u update the vllm recipe?


if [ "${DP_ATTENTION}" = "true" ]; then
PARALLEL_ARGS="--tensor-parallel-size=1 --data-parallel-size=$TP --enable-expert-parallel"
elif [ "$EP_SIZE" -gt 1 ]; then
Expand Down
6 changes: 6 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1684,3 +1684,9 @@
description:
- "Add VLLM_FLOAT32_MATMUL_PRECISION=high, remove VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1106

- config-keys:
- minimaxm2.5-fp4-b300-vllm
description:
- "Add VLLM_FLOAT32_MATMUL_PRECISION=high, update search space concurrency ranges"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1107
Comment on lines +1689 to +1692
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry uses an unresolved placeholder 'pull/XXX' in the pr-link field; since this PR has been assigned #1107, the link should be updated to 'pull/1107' before merging so the changelog entry can be traced back to this PR.

Extended reasoning...

The new changelog entry added by this PR at perf-changelog.yaml:1654 contains 'pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX'. The PR being reviewed is #1107, so the correct link is '#1107'. This means the changelog entry, which is meant to serve as a traceable record of what changed and when, cannot be linked back to this PR after merge.

The specific code path is simple: the perf-changelog.yaml file is a human-maintained audit trail mapping config-keys to the PR that introduced each change. The entry for 'minimaxm2.5-fp4-b300-vllm' documents 'Add VLLM_FLOAT32_MATMUL_PRECISION=high, update search space concurrency ranges' but the pr-link placeholder prevents future readers from finding the originating PR.

The refutation argument notes that XXX/XXXX placeholders appear in approximately 22 other entries in the file, making it an 'accepted pattern.' However, accepted pattern does not mean correct behavior — those entries are also missing proper PR links and represent the same documentation debt. The PR being [WIP] further explains why the link was left as a placeholder during development, but the expectation should be that it is resolved before merging (i.e., right now, since the PR number is already known).

The impact is limited to documentation quality. No runtime behavior, benchmark configurations, or benchmark scripts are affected. However, an XXX link in the merged changelog permanently obscures which PR introduced this change, making historical archaeology harder for maintainers. The changelog's value is precisely in its traceability.

The fix is a one-line change: replace 'pull/XXX' with 'pull/1107' in the last line of the new entry at perf-changelog.yaml:1654. This is a nit-level issue — it should be fixed before merging but does not block functionality.

Loading