Tweak 70b configs to get some more performance by qcolombet · Pull Request #68 · SemiAnalysisAI/InferenceX

qcolombet · 2025-09-28T16:37:21Z

No description provided.

Mirrors the NVIDIA-official TEP recipe for very low concurrency: https://github.com/NVIDIA/srt-slurm/blob/aflowers/gb200-dsv4-recipes/ recipes/vllm/deepseek-v4-pro/8k1k/disagg-gb200-1p1d-dep8-tep8.yaml Topology: 1 prefill (DP=8) + 1 decode (TP=8) — 4 nodes. Adds 1k/1k sibling (no upstream equivalent) by shrinking max-model-len to 3072. Local deviations from upstream (documented in recipe headers): * model.path renamed deepseekv4-fp4 -> deepseek-v4-pro to match our launch script's SRT_SLURM_MODEL_PREFIX. * Stripped CPU/DRAM offload knobs and numa-bind (our pinned NVIDIA/srt-slurm@sa-submission-q2-2026 clone doesn't ship the vllm_numa_bind_hash_fix.py patch upstream uses). * benchmark.use_chat_template: false (no PR #68 sa-bench changes in our srtctl); benchmark.tokenizer_mode dropped for the same reason. * Container kept on the floating tag; health_check + slurm.time_limit added for cold-cache Lustre loads. Replaces the 1p4d-dep8-dep8 low-conc entries (10-node, 4 decode workers) with this 4-node TEP topology in both 1k/1k (active) and 8k/1k (still commented). Deletes the now-unused 1p4d-dep8-dep8 recipe files. Active 1k/1k sweep: 3 entries / 14 benchmark points.

Tweak 70b configs to get some more performance

bec6a86

qcolombet marked this pull request as ready for review September 28, 2025 18:52

qcolombet requested a review from functionstackx September 28, 2025 18:53

functionstackx merged commit d25f4f6 into main Sep 28, 2025

functionstackx deleted the qcolombet-70b-tweak-0927 branch September 28, 2025 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak 70b configs to get some more performance#68

Tweak 70b configs to get some more performance#68
functionstackx merged 1 commit intomainfrom
qcolombet-70b-tweak-0927

qcolombet commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qcolombet commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants