-
Notifications
You must be signed in to change notification settings - Fork 156
Add B300 config: minimaxm2.5-fp4-vllm #1055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # NOTE: At the time of submission, https://docs.vllm.ai/projects/recipes/en/latest/MiniMax/MiniMax-M2.html | ||
| # does not have a B300-specific recipe, so this script reuses the existing | ||
| # MiniMax-M2.5 FP4 B200 vLLM recipe as-is until B300-specific tuning is available. | ||
|
|
||
| source "$(dirname "$0")/../benchmark_lib.sh" | ||
|
|
||
| check_env_vars \ | ||
| MODEL \ | ||
| TP \ | ||
| EP_SIZE \ | ||
| DP_ATTENTION \ | ||
| CONC \ | ||
| ISL \ | ||
| OSL \ | ||
| MAX_MODEL_LEN \ | ||
| RANDOM_RANGE_RATIO \ | ||
| RESULT_FILENAME | ||
|
|
||
| if [[ -n "$SLURM_JOB_ID" ]]; then | ||
| echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" | ||
| fi | ||
|
|
||
| nvidia-smi | ||
|
|
||
| hf download "$MODEL" | ||
|
|
||
| SERVER_LOG=/workspace/server.log | ||
| PORT=${PORT:-8888} | ||
|
|
||
| if [ "${DP_ATTENTION}" = "true" ]; then | ||
| PARALLEL_ARGS="--tensor-parallel-size=1 --data-parallel-size=$TP --enable-expert-parallel" | ||
| elif [ "$EP_SIZE" -gt 1 ]; then | ||
| PARALLEL_ARGS="--tensor-parallel-size=$TP --enable-expert-parallel" | ||
| else | ||
| PARALLEL_ARGS="--tensor-parallel-size=$TP" | ||
| fi | ||
|
|
||
| if [ "${EVAL_ONLY}" = "true" ]; then | ||
| setup_eval_context | ||
| MAX_MODEL_LEN="$EVAL_MAX_MODEL_LEN" | ||
| fi | ||
| # Start GPU monitoring (power, temperature, clocks every second) | ||
| start_gpu_monitor | ||
|
|
||
| set -x | ||
| vllm serve $MODEL --port $PORT \ | ||
| $PARALLEL_ARGS \ | ||
| --gpu-memory-utilization 0.90 \ | ||
| --max-model-len $MAX_MODEL_LEN \ | ||
| --kv-cache-dtype fp8 \ | ||
| --max-cudagraph-capture-size 2048 \ | ||
| --max-num-batched-tokens "$((ISL * 2 ))" \ | ||
| --stream-interval 20 --no-enable-prefix-caching \ | ||
| --trust-remote-code > $SERVER_LOG 2>&1 & | ||
|
|
||
| SERVER_PID=$! | ||
|
|
||
| # Wait for server to be ready | ||
| wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID" | ||
|
|
||
| run_benchmark_serving \ | ||
| --model "$MODEL" \ | ||
| --port "$PORT" \ | ||
| --backend vllm \ | ||
| --input-len "$ISL" \ | ||
| --output-len "$OSL" \ | ||
| --random-range-ratio "$RANDOM_RANGE_RATIO" \ | ||
| --num-prompts "$((CONC * 10))" \ | ||
| --max-concurrency "$CONC" \ | ||
| --result-filename "$RESULT_FILENAME" \ | ||
| --result-dir /workspace/ \ | ||
| --trust-remote-code | ||
|
|
||
| # After throughput, run evaluation only if RUN_EVAL is true | ||
| if [ "${RUN_EVAL}" = "true" ]; then | ||
| run_eval --framework lm-eval --port "$PORT" | ||
| append_lm_eval_summary | ||
| fi | ||
|
|
||
| # Stop GPU monitoring | ||
| stop_gpu_monitor | ||
| set +x |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 The new
minimaxm2.5-fp4-b300-vllmentry inperf-changelog.yamlhas a placeholderpr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXXthat was never replaced with the actual PR number. Since this PR is #1055, the link should be updated topull/1055.Extended reasoning...
Bug description: The
perf-changelog.yamlentry added by this PR for theminimaxm2.5-fp4-b300-vllmconfig (lines 1413-1414) contains a template placeholderXXXXin thepr-linkfield rather than the actual PR number, resulting in a broken link:https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.Code path: The diff clearly shows the new entry at the bottom of
perf-changelog.yamlincludespr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. The PR number (1055) is known at submission time — it is literally the number of this pull request.Why existing code doesn't prevent it: There is no automated validation step that checks
perf-changelog.yamlentries for placeholder patterns likeXXXX. The surrounding entries demonstrate the correct pattern: the immediately preceding entry fordsr1-fp4-b300-sglang(PR #1049) correctly usespr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1049, and similarly forqwen3.5-fp8-b300-sglang(PR #1048).Impact: The broken link makes it impossible to trace the changelog entry back to the originating PR. Anyone reading the changelog to understand when or why the
minimaxm2.5-fp4-b300-vllmconfig was added will be unable to follow the reference. There are already 7 pre-existingXXXplaceholder entries in the file (lines 12, 19, 315, 790, 818, 855, 872), and this PR introduces another instance, continuing a pattern that reduces changelog utility.Fix: Replace
pull/XXXXwithpull/1055on thepr-linkline of the newminimaxm2.5-fp4-b300-vllmentry.Step-by-step proof: (1) This PR is numbered #1055 — visible in the PR metadata. (2) The diff adds a new
perf-changelog.yamlentry withpr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. (3) At merge time, clicking that link leads to a 404 (or an unrelated PR if one happens to be numbered XXXX). (4) The correct valuehttps://github.com/SemiAnalysisAI/InferenceX/pull/1055would link directly to this PR. The fix is a one-word substitution.Addressing the duplicate refutation: A separate bug_003 was reported describing the identical issue. Whether or not bug_003 is processed, this report independently identifies the same real defect; the fix remains the same regardless of which report is acted upon.