Skip to content

[NVIDIA] port b200 from docker to slurm due to change of cluster#71

Merged
functionstackx merged 1 commit intomainfrom
b200-vllm-sglang-docker-to-slurm
Sep 28, 2025
Merged

[NVIDIA] port b200 from docker to slurm due to change of cluster#71
functionstackx merged 1 commit intomainfrom
b200-vllm-sglang-docker-to-slurm

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

No description provided.

@functionstackx functionstackx marked this pull request as ready for review September 28, 2025 23:06
@functionstackx functionstackx merged commit c88a4c3 into main Sep 28, 2025
@functionstackx functionstackx deleted the b200-vllm-sglang-docker-to-slurm branch September 28, 2025 23:06
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
@cquil11 cquil11 changed the title port b200 from docker to slurm due to change of cluster [NVIDIA] port b200 from docker to slurm due to change of cluster Apr 8, 2026
Oseltamivir added a commit that referenced this pull request Apr 24, 2026
Replaces our hand-rolled 8k/1k DSV4-Pro vLLM disagg recipes with the
four topologies from NVIDIA/srt-slurm PR #71 (source fork:
alec-flowers/srt-slurm, branch aflowers/dsv4-pr67-pr68, pinned at
commit d60e3f1c). PR #71 supersedes PR #67 that our original 8k/1k
recipes were based on, with more topologies, a wider concurrency
sweep per recipe, new env vars, explicit tokenizer-mode, and CPU/DRAM
expert offload.

We take everything except offload:

  * launch_gb200-nv.sh clones alec-flowers/srt-slurm for dsv4 instead
    of NVIDIA/srt-slurm.
  * Runtime post-clone patch strips `offload-group-size`,
    `offload-num-in-group`, `offload-prefetch-step`, and the commented
    `# offload-params` line from all four 8k/1k recipes.
  * Same post-clone patch injects our `slurm.time_limit: 8:00:00` and
    `health_check: {max_attempts: 1440, interval_seconds: 10}` (4 h
    budget) so the recipes match our cold-cache Lustre load budget.
  * Model-path alias changed from `deepseek-v4-pro` to `deepseekv4-fp4`
    to match PR #71 recipes' `model.path` field; 1k/1k local recipes
    updated to the same alias.
  * nvidia-master.yaml 8k/1k block rewritten: 4 search-space entries
    (1p1d-dep8-dep8, 3p1d-dep8-dep8, 3p1d-dep8-dep16, 6p1d-dep8-dep16),
    each running conc list [4, 8, 16, 32, 64, 256, 512, 1024] — 32 total
    8k/1k benchmark points across 4 cluster startups.
  * Obsolete local 8k/1k recipes under srt-slurm-recipes/vllm/deepseek-v4/8k1k/
    removed (superseded by the PR #71 upstream files).

1k/1k sweep is unchanged otherwise (2 matrix entries, 9 benchmark
points using the hand-rolled recipes — no PR #71 equivalent at 1k/1k).
Oseltamivir added a commit that referenced this pull request Apr 24, 2026
Oseltamivir added a commit that referenced this pull request Apr 25, 2026
* runners/launch_gb200-nv.sh: switch the recipe overlay step from
  `cp -r src dst` to `cp -rT src dst` (with explicit `mkdir -p dst`
  first). Addresses the bot review nit at line 144 — `cp -r src dst`
  works only because the upstream sa-submission-q2-2026 branch has no
  `recipes/vllm/deepseek-v4/` directory today; if upstream ever ships
  one, `cp -r` would nest as `recipes/vllm/deepseek-v4/deepseek-v4/...`
  and CONFIG_FILE in nvidia-master.yaml would silently resolve to the
  upstream stub. `-T` overlays unconditionally.

* perf-changelog.yaml: refresh the dsv4-fp4-gb200-dynamo-vllm entry's
  description. The previous wording referenced "8k1k, 7p1d-dep8-dep16"
  and "Mirrors NVIDIA/srt-slurm PR #67" which is stale after the move
  to a 1k/1k sweep with TEP low-conc (mirrored from PR #71) plus two
  hand-rolled mid/high topologies. Also fixes the directory reference
  (recipes moved to benchmarks/multi_node/srt-slurm-recipes/ during
  the cleanup pass).
Oseltamivir added a commit that referenced this pull request Apr 29, 2026
* Re-submit dsv4-fp4-gb200-dynamo-vllm against srt-slurm aflowers/gb200-dsv4-recipes (PR #77)

Repoint launch_gb200-nv.sh to NVIDIA/srt-slurm@aflowers/gb200-dsv4-recipes,
which supersedes #71 and ships the vllm_numa_bind_hash_fix.py patch and
sa-bench DSV4 tokenizer support — so numa-bind, benchmark.use_chat_template,
and benchmark.tokenizer_mode no longer have to be stripped from recipes.

8k/1k search-space expanded from 3 topologies to 8: adds 1p4d/1p8d pure-TP
decode (offload), 1p1d/2p1d/3p1d DEP-8 decode, and a 3p1d-dep16-40 wide
decode shape. 1k/1k topologies unchanged (no upstream 1k/1k counterpart);
1k/1k tep8 also re-enables numa-bind + chat template to stay consistent.

Local recipe deltas vs upstream are limited to: model.path alias rename
deepseekv4-fp4 -> deepseek-v4-pro (matches SRT_SLURM_MODEL_PREFIX), container
kept on the floating :deepseekv4-cu130 tag, slurm.time_limit added, and
health_check.max_attempts bumped 360 -> 1440 for cold-cache loads.

* Revert 1k/1k tep8 recipe changes; leave 1k/1k untouched

The 1k/1k tep8 numa-bind + chat-template re-enabling is rolled back —
1k/1k stays at the previous local-extrapolation tuning. Updates the
perf-changelog entry to reflect that.

* Comment out VLLM_RANDOMIZE_DP_DUMMY_INPUTS / VLLM_MOE_ROUTING_SIMULATION_STRATEGY

These were upstream's tools for measuring most-optimal engine perf via
randomized routing — disable them so the benchmark exercises the real
expert routing path. Applied to every recipe that had them (all 8 new
8k/1k recipes plus the 1k/1k tep8 recipe).

* Switch to deepseek-v4-pro-sa SA-curated subset; drop 1k/1k

Re-mirror from NVIDIA/srt-slurm aflowers/gb200-dsv4-recipes branch under
recipes/vllm/deepseek-v4-pro-sa/ — the SemiAnalysis-curated subset of
PR #77. 1k/1k recipes are removed (only 8k/1k is in scope now).

Topology changes vs the previous mirror:
* drop 1p1d-tep8, 2p1d-c256-c512-c1024, 3p1d-c2048, 3p1d-dep16-40, 7p1d
* keep 1p1d-dep8-dep8-16 (concurrencies bumped to 64x128x256x512x1024),
  1p4d-tp8, 1p8d-tp8
* add new c4096-offload variants: 2p1d-dep8-dep8, 3p1d-dep8-dep8,
  3p1d-dep8-dep16

Other consistency fixes:
* dynamo.install: false uniformly (matches -sa/ — assumes pre-installed
  dynamo in the container)
* dynamo.hash 6a159fed... uniformly
* model.container set to vllm/vllm-openai:deepseekv4-cu130-dynamo across
  all 6 recipes so the recipe lookup matches the alias key the launch
  script registers in srtslurm.yaml from nvidia-master.yaml's image:
  field
* slurm.time_limit + health_check inserted right after setup_script: in
  a consistent position

* Update perf-changelog.yaml

* Switch to vLLM 0.20.0 + dynamo wheel pin; rebase recipes on aflowers/vllm-gb200-v0.20.0

Bump container image to vllm/vllm-openai:v0.20.0-ubuntu2404@sha256:46da022c...
in nvidia-master.yaml and across all 6 recipes (keeps the recipe
model.container in lockstep with the alias key the launch script registers
in srtslurm.yaml).

Repoint launch_gb200-nv.sh from aflowers/gb200-dsv4-recipes to
aflowers/vllm-gb200-v0.20.0 — the 0.20.0 branch.

Per-recipe changes:
* Replace dynamo.hash + dynamo.install: false with dynamo.install: true
  + wheel: "1.2.0.dev20260426". The new container is vanilla vLLM 0.20.0
  without dynamo pre-installed, so srtctl installs from the pinned wheel.
* Add benchmark.custom_tokenizer:
  "sa_bench_tokenizers.vllm_deepseek_v4.VLLMDeepseekV4Tokenizer"
* Add identity: block at the bottom of every recipe — model repo+revision,
  container image (sha256), and dynamo+vllm framework versions for
  reproducibility tracking.
* 1p8d recipe: add conc 1 (concurrencies "1x8x16x32x64x128x256x512") and
  rename to disagg-gb200-1p8d-dep8-tp8-c1-c8-c16-c32-c64-c128-c256-offload.yaml.
  CONFIG_FILE reference in nvidia-master.yaml updated; conc-list extended
  to [1, 8, 16, 32, 64, 128, 256, 512].

* Drop benchmark.tokenizer_mode from all 6 recipes

custom_tokenizer (added in the previous commit) covers sa-bench's
DSV4 tokenization; the redundant tokenizer_mode field is no longer
needed. The vllm_config.{prefill,decode}.tokenizer-mode worker-side
setting is unchanged.

* Strip sha256 pin from vllm container references

Use just the tag (vllm/vllm-openai:v0.20.0-ubuntu2404) in
nvidia-master.yaml image:, every recipe's model.container, every
recipe's identity.container.image, and the recipe header comment
lines.

* Drop identity.model from all 6 recipes

The /mnt/numa1/models/deepseek-v4-pro/ stage doesn't carry HF revision
metadata (no .huggingface/refs/main, no .cache/huggingface/download/
metadata), so identity.model.revision verification would fail every
job with "no HF revision found at /model". Drop the block until the
stage is re-populated via huggingface_hub.snapshot_download or the
ref marker is planted manually. identity.container and identity.frameworks
are preserved.

* Switch dsv4-fp4 MODEL_PATH from /mnt/numa1 to /mnt/lustre01

The compute-node-local NVMe path is not visible to the GHA runner host,
so srtctl preflight (which runs there) failed with "model path
unavailable". Use the Lustre copy instead so preflight resolves the
alias to a path the runner can stat.

* Trim DSv4 GB200 dynamo-vLLM configs

* Fix perf changelog entry formatting

* Restore dynamic GB200 container import

---------

Co-authored-by: Oseltamivir <bryansg2013@gmail.com>
Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: Alec Flowers <aflowers@nvidia.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants