sglang dsv4 MTP#1166
Conversation
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp |
|
@Qiaolin-Yu Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948369055 |
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp |
|
@Qiaolin-Yu Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948810957 |
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp |
|
@Qiaolin-Yu Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24949162299 |
fffb295 to
f64505b
Compare
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp |
|
@yhyang201 Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24953792156 |
The tokenizer for DSv4-Pro has no chat_template set, so --use-chat-template causes benchmark_serving.py to crash with ValueError. Remove it to align with dsv4_fp4_b300_sglang.sh. Also add a floor of 8 to --max-running-requests to match the base script and avoid too-low values at low concurrency.
Rebase perf-changelog.yaml on latest main (preserving SemiAnalysisAI#1173 and SemiAnalysisAI#1174 entries) and append the MTP config entry for PR SemiAnalysisAI#1166.
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp |
|
@yhyang201 Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24956420257 |
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp |
|
@yhyang201 Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24957535378 |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp |
|
@yhyang201 Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24957880039 |
e2ae658 to
c1d65ee
Compare
…(4,1,5)" This reverts the EAGLE spec params back to (3, 1, 4): --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174), and changelog retriggers (#1178) on top of the original #1143 entry. Restores the script and config block to their #1143 state and clears all prior dsv4-fp4-b300-sglang changelog entries to start fresh. The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1184 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: keep only the original #1143 entry, drop new entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add MTP recipe for DSv4 FP4 on B300 with SGLang.