Skip to content

Tweak 70b configs to get some more performance#68

Merged
functionstackx merged 1 commit intomainfrom
qcolombet-70b-tweak-0927
Sep 28, 2025
Merged

Tweak 70b configs to get some more performance#68
functionstackx merged 1 commit intomainfrom
qcolombet-70b-tweak-0927

Conversation

@qcolombet
Copy link
Copy Markdown
Contributor

No description provided.

@qcolombet qcolombet marked this pull request as ready for review September 28, 2025 18:52
@functionstackx functionstackx merged commit d25f4f6 into main Sep 28, 2025
@functionstackx functionstackx deleted the qcolombet-70b-tweak-0927 branch September 28, 2025 19:30
Oseltamivir added a commit that referenced this pull request Apr 25, 2026
Mirrors the NVIDIA-official TEP recipe for very low concurrency:

  https://github.com/NVIDIA/srt-slurm/blob/aflowers/gb200-dsv4-recipes/
    recipes/vllm/deepseek-v4-pro/8k1k/disagg-gb200-1p1d-dep8-tep8.yaml

Topology: 1 prefill (DP=8) + 1 decode (TP=8) — 4 nodes. Adds 1k/1k
sibling (no upstream equivalent) by shrinking max-model-len to 3072.

Local deviations from upstream (documented in recipe headers):
  * model.path renamed deepseekv4-fp4 -> deepseek-v4-pro to match our
    launch script's SRT_SLURM_MODEL_PREFIX.
  * Stripped CPU/DRAM offload knobs and numa-bind (our pinned
    NVIDIA/srt-slurm@sa-submission-q2-2026 clone doesn't ship the
    vllm_numa_bind_hash_fix.py patch upstream uses).
  * benchmark.use_chat_template: false (no PR #68 sa-bench changes in
    our srtctl); benchmark.tokenizer_mode dropped for the same reason.
  * Container kept on the floating tag; health_check + slurm.time_limit
    added for cold-cache Lustre loads.

Replaces the 1p4d-dep8-dep8 low-conc entries (10-node, 4 decode workers)
with this 4-node TEP topology in both 1k/1k (active) and 8k/1k (still
commented). Deletes the now-unused 1p4d-dep8-dep8 recipe files.

Active 1k/1k sweep: 3 entries / 14 benchmark points.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants