chore: upstream srt-slurm recipes + first-class recipe field + custom-bench wrapper by cquil11 · Pull Request #1211 · SemiAnalysisAI/InferenceX

cquil11 · 2026-04-28T15:14:15Z

Summary

Pull every srt-slurm recipe we depend on into this repo and run multi-node throughput benchmarks against our own bench client instead of upstream srt-slurm's bundled sa-bench.

What changes

Recipes upstreamed. ~370 recipe yamls copied from NVIDIA/srt-slurm@sa-submission-q2-2026 into benchmarks/multi_node/srt-slurm-recipes/. Layout: <model>/<framework>/<hw>-<precision>/<isl><osl>/<agg|disagg>/<stp|mtp>/<recipe>.yaml so a sweep contributor can find every recipe in a given cell without grepping.
First-class recipe: field on master-yaml search-space entries. Replaces the prefill.additional-settings: ["CONFIG_FILE=recipes/..."] env-var hack. Schema validates the file exists on disk.
Multi-node bench client lives in this repo. Recipes use benchmark.type: custom + command: bash /infmax-workspace/benchmarks/multi_node/srt_bench.sh, which is a thin loop on top of the existing run_benchmark_serving() shim that single-node scripts use. Removes the dual-maintain burden on bench_serving.py.
srt-slurm pin. All clones go to a single pinned commit on NVIDIA/srt-slurm@main (set in benchmark_lib.sh::clone_and_install_srtctl). Drops the legacy ishandhanani/srt-slurm@sa-submission-q1-2026 fallback for gb200 dsr1 sglang/trt.
Launcher refactor. Per-launcher copies of the clone+uv install + image-name sanitization are factored into helpers (clone_and_install_srtctl, sanitize_image_filename) in benchmarks/benchmark_lib.sh.

How env reaches the bench container

benchmark-multinode-tmpl.yml already exports MODEL/ISL/OSL/CONC_LIST/DISAGG/RANDOM_RANGE_RATIO at the workflow step. These propagate via bash launcher.sh → srtctl apply → subprocess.Popen(env=None) → srun (default --export=ALL) → pyxis → bench container. So recipes only carry per-recipe knobs in benchmark.env:

container_mounts:
  "$INFMAX_WORKSPACE": "/infmax-workspace"
benchmark:
  type: "custom"
  command: "bash /infmax-workspace/benchmarks/multi_node/srt_bench.sh"
  env:
    PREFILL_GPUS: "4"
    DECODE_GPUS: "8"
    TOTAL_GPUS: "20"
    # MODEL_NAME: "..."        # only when server's served-model-name diverges from $MODEL
    # USE_CHAT_TEMPLATE: "false" # only when overriding default
    # DSV4: "true"             # only when the recipe needs --dsv4 tokenizer

--tokenizer /model is hardcoded in the wrapper because srt-slurm's RuntimeContext.create auto-mounts the model directory at /model inside every container. Avoids HF Hub egress and works for HF-id and alias-only model: values alike.

Validation

✅ Schema validates 82 master-yaml entries
✅ 149/149 matrix-logic tests pass
✅ End-to-end real-cluster run on B200 / sglang / dsr1-fp4 / 1k/1k @ conc=4×8×32 (run 25079718885)

Out of scope (follow-ups)

3 sglang multi-override base files (dsr1/sglang/b200-fp{4,8}/.../1k1k.yaml, 8k1k.yaml) keep their :override[N] mechanism and stay on sa-bench. Their 26 master-yaml refs continue to dispatch via the bundled sa-bench until they're split into per-variant standalone files.
Cluster-tuning sed patches in launchers (max_attempts: 720, h100 dist-timeout: 1800, h200 health_check rewrite) are still in-place; can move into recipes proper as a separate cleanup.

Upstream srt-slurm contributions made along the way

NVIDIA/srt-slurm#111: demote per-srun command line from INFO to DEBUG (5KB fingerprint heredoc was dominating logs). Merged.
NVIDIA/srt-slurm#110 (Ishan): make nginx ulimit + worker_rlimit_nofile opt-in via frontend.nginx_raise_ulimit field. Unblocks our cluster (RLIMIT_NOFILE hard cap < 1M). Merged.

Recipes referenced from NVIDIA/srt-slurm@sa-submission-q2-2026 are now tracked under benchmarks/multi_node/srt-slurm-recipes/, mirroring the upstream `recipes/` layout. The master-yaml plumbing for selecting one is hoisted out of `prefill.additional-settings: ["CONFIG_FILE=recipes/..."]` into a first-class `recipe:` field on the search-space entry, validated against on-disk paths so unknown recipes fail fast at sweep generation. The benchmark template resolves it to an absolute scratch-copy path passed to launchers as CONFIG_FILE, so launcher behavior is unchanged otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Six launchers each carried a ~22-line copy of the same git-clone, uv-install, venv-create, srtctl-install sequence. Lift it into clone_and_install_srtctl() in benchmarks/benchmark_lib.sh, parameterized by SRT_REPO_URL/SRT_BRANCH and UV_INSTALL_DIR/UV_VENV_DIR env vars so each launcher can keep its workspace- vs-NFS-vs-default-HOME placement decisions explicit at the call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lift the `echo "$IMAGE" | sed 's/[/:@#]/_/g'` slug used to name squash files out of 13 launchers and into sanitize_image_filename() in benchmark_lib.sh. Cluster-specific separator (h100/h200-dgxc-slurm use '+' instead of '_') is expressed as the second arg, and the nvcr.io/-prefix-strip variant becomes `sanitize_image_filename "${IMAGE#nvcr.io/}" +` rather than a sed pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-28T15:14:30Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Restructure benchmarks/multi_node/srt-slurm-recipes/ from the upstream's heterogeneous layout into a uniform tree: <model>/<framework>/<hw>-<precision>/<isl>/<osl>/<agg|disagg>/<stp|mtp>/<recipe>.yaml so a sweep contributor can navigate to dsr1/trtllm/b200-fp4/1k/1k/disagg/mtp/ and immediately see every recipe that fits that cell. The 3 sglang multi-override files that span both stp and mtp are parked one level shallower (no trailing stp|mtp/), since the override section selects spec. 365 files moved, 388 active + 5 commented recipe references rewritten, schema validation + tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per request, drop the awkward `1k/1k/` two-segment intermediate in the recipe tree in favor of `1k1k/`. New shape: <model>/<framework>/<hw>-<precision>/<isl><osl>/<agg|disagg>/<stp|mtp>/<recipe>.yaml 370 files renamed, 393 recipe references in nvidia-master.yaml rewritten, schema validation + tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop the ishandhanani/srt-slurm@sa-submission-q1-2026 fallback in launch_gb200-nv.sh — every launcher now clones NVIDIA/srt-slurm at the pinned commit 52e697d (nginx fd-limit fix on origin/main, Apr 2026). Pinning to a SHA instead of a moving branch keeps benchmark runs reproducible across upstream churn. Rename the helper's SRT_BRANCH env var to SRT_REF for accuracy (it accepts any git ref, not just a branch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop env-var override for SRT_REPO_URL / SRT_REF — every benchmark run must use the same pinned srtctl, no ad-hoc overrides at the call site. Bumping the pin is now a one-line edit to benchmark_lib.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…of-life) Stop relying on srt-slurm's bundled `benchmark.type: sa-bench` (which ships its own copy of bench_serving.py inside the upstream repo) and instead use `benchmark.type: custom` to run *this* repo's utils/bench_serving against the already-ready frontend. Avoids dual-maintaining the bench client. Plumbing: - benchmarks/multi_node/srt_bench.sh: thin wrapper that mirrors sa-bench's per-conc warmup-then-bench loop, writes results to the same /logs/sa-bench_isl_<ISL>_osl_<OSL>/results_concurrency_<N>_gpus_<TOT>_ctx_<P>_gen_<D>.json layout the launcher result-harvesters already grep, with conc list parsed from x-separated env (e.g. "128x256x1024"). - Recipe shape: add `container_mounts: { $INFMAX_WORKSPACE: /infmax-workspace }` + replace `benchmark: { type: sa-bench, ... }` with `benchmark: { type: custom, command: "bash /infmax-workspace/...", env: {...} }`. Migrated as proof-of-life: - dsr1/trtllm/b200-fp4/1k1k mtp ctx1_gen2_dep8_batch64_eplb0_mtp2 (TRT-LLM) - dsr1/sglang/gb200-fp4/1k1k stp low-latency (SGLang) - dsv4/vllm/gb200-fp4/1k1k stp disagg-gb200-1p1d-dep8-tep8 (vLLM) Remaining ~360 recipes still use sa-bench; they migrate in a follow-up once this triplet runs end-to-end on a real cluster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two pieces, one commit: 1. benchmark_lib.sh's run_benchmark_serving() gains optional pass-throughs for --tokenizer / --endpoint / --dataset-name / --dataset-path so the multi-node srt_bench.sh wrapper can reuse it instead of forking its own command-build. (--request-rate stays hardcoded "inf" — no recipe-level override.) ~50 lines of duplicated shell deleted from srt_bench.sh. 2. Recipe `benchmark.env` blocks lose every variable that is already exported by .github/workflows/benchmark-multinode-tmpl.yml at the workflow step level: MODEL, ISL, OSL, CONC_LIST, DISAGG, RANDOM_RANGE_RATIO. Those propagate down through srtctl → srun (default --export=ALL) → pyxis into the bench container, so srt_bench.sh reads them directly. Recipes now only carry per-recipe topology knobs (PREFILL_GPUS / DECODE_GPUS / TOTAL_GPUS — used in the result filename) plus the rare overrides. Tokenizer is hardcoded to /model — srtctl's RuntimeContext.create unconditionally bind-mounts the local model dir at that path in every container, so AutoTokenizer.from_pretrained("/model") loads from the same files the engine is serving. No HF Hub egress, works for HF-id and alias- only `model:` values alike, no `TOKENIZER_PATH` knob in recipes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er/--endpoint Walked back the --dataset-name / --dataset-path additions to run_benchmark_serving — both default cleanly (random / unset) for every multi-node throughput sweep we run, so the pass-throughs were dead weight. srt_bench.sh stops setting DATASET_NAME / DATASET_PATH from env. Kept --tokenizer (srt_bench points it at /model since --model is the served-model-name alias, not a HF id) and --endpoint (recipes may need /v1/chat/completions for chat-template-only request paths). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same behavior, fewer lines: collapse the two-step suffix split into a single ${RECIPE#"${RECIPE%%:*}"} parameter expansion. 12 active lines become 5. No semantic change — verified parsing for plain paths, :override, and :zip_override_<name>[N] forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ression Upstream commit 52e697d (#108 "fix(nginx): raise file descriptor limit for nginx workers") prepends `ulimit -n 1048576 &&` to the nginx srun command. On clusters whose container inherits a sub-1M RLIMIT_NOFILE hard limit from slurmd/PAM, the bash builtin's setrlimit fails with EPERM (raising the hard rlimit needs CAP_SYS_RESOURCE in the init user namespace, which pyxis --container-remap-root does not grant). The `&&` short-circuits and nginx never starts — caught when re-running dsr1-fp4-gb200-dynamo-sglang. Pin back to 698590e ("feat(config): cluster-wide default_bash_preamble for ulimits and the like (#104)"), the immediately prior commit, where nginx runs without the chained ulimit. Bump forward once upstream softens the ulimit to `|| true` or makes it opt-in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the temporary rollback (698590e) with the upstream fix branch. 425b486 is the tip of NVIDIA/srt-slurm's `ishan-rework-nginx`, which makes the nginx ulimit + nginx.conf `worker_rlimit_nofile` directive opt-in via a new `frontend.nginx_raise_ulimit` field (default false). Without us opting in, nginx runs without the EPERM-prone bump that #108 introduced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…letions Upstream sa-bench used `--backend dynamo --endpoint /v1/completions`, but this repo's benchmark_serving.py doesn't have a `dynamo` backend choice (it has tgi/vllm/lmdeploy/deepspeed-mii/openai/openai-chat/tensorrt-llm/ scalellm/sglang). The dynamo frontend exposes a generic OpenAI-compatible API regardless of the underlying engine, so `openai` is the right canonical default. Recipes that need /v1/chat/completions can override via ENDPOINT. Also drop the unconditional `--endpoint /v1/completions` — bench_serving.py already defaults to that, and we now only pass --endpoint when ENDPOINT is non-empty (matches single-node bench scripts that don't pass it at all). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both fixes we wanted are now on origin/main: * #110 nginx-rework-ulimit (Ishan): gates the 1M nofile bump behind opt-in frontend.nginx_raise_ulimit. Default off, fixes clusters whose container RLIMIT_NOFILE hard cap < 1M. * #111 (cam): demotes the per-srun command logger.info to logger.debug so the 5KB fingerprint heredoc stops dominating orchestrator logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Now that the proof-of-life recipe (dsr1-fp4-gb200-dynamo-sglang low-latency, conc 4/8/32) ran clean end-to-end on a real cluster, sweep the rest of the tree onto the new shape so all multi-node throughput sweeps drive utils/bench_serving/benchmark_serving.py via benchmarks/multi_node/srt_bench.sh instead of srt-slurm's bundled sa-bench client. Each migrated recipe replaces: benchmark: type: "sa-bench" isl: … osl: … concurrencies: … req_rate: … [use_chat_template: false] with: container_mounts: "$INFMAX_WORKSPACE": "/infmax-workspace" benchmark: type: "custom" command: "bash /infmax-workspace/benchmarks/multi_node/srt_bench.sh" env: [MODEL_NAME: "..."] # only when server's served-model-name diverges # from the master-yaml `model:` value PREFILL_GPUS: "..." # per prefill worker (filename component) DECODE_GPUS: "..." # per decode worker (filename component) TOTAL_GPUS: "..." # sum across all workers (filename component) [USE_CHAT_TEMPLATE: "false"] # only carried over when set in original GPU counts derived from each recipe's `resources:` block — uses gpus_per_prefill / gpus_per_decode when set, else falls back to nodes * gpus_per_node / workers. MODEL_NAME override added on the 59 sglang recipes whose backend.sglang_config.served-model-name is "deepseek-ai/DeepSeek-R1" while master-yaml `model:` is the more specific "deepseek-ai/DeepSeek-R1-0528" / "nvidia/DeepSeek-R1-0528-NVFP4-v2" revision tag. Skipped: - 3 sglang multi-override base files (1k1k.yaml / 8k1k.yaml under dsr1/sglang/b200-fp{4,8}/) — their `benchmark:` lives nested under `base:` and gets sa-bench-style overrides per `:override[N]` reference. Migrating them needs a separate pass that handles the override-merge semantics; their 26 master-yaml refs continue to dispatch via srt-slurm's bundled sa-bench until then. Tracked as follow-up. Validation: schema accepts all 81 master-yaml entries, 149/149 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # .github/configs/nvidia-master.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/1k1k/disagg/stp/disagg-gb200-1p1d-dep8-dep16.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/1k1k/disagg/stp/disagg-gb200-1p1d-dep8-tep8.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/1k1k/disagg/stp/disagg-gb200-3p1d-dep8-dep16.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-1p1d-dep8-tep8.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-2p1d-dep8-dep8-c4096-offload.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-3p1d-dep8-dep16-c4096-offload.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-3p1d-dep8-dep16.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-7p1d-dep8-dep16.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-low-latency.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-low-middle-curve.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-max-tpt-megamoe.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-max-tpt.yaml # benchmarks/multi_node/srt-slurm-recipes/dsv4/vllm/gb200-fp4/8k1k/disagg/stp/disagg-gb200-mid-curve.yaml # benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/1k1k/disagg-gb200-1p1d-dep8-tep8.yaml # benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/8k1k/disagg-gb200-1p1d-dep8-tep8.yaml # benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/8k1k/disagg-gb200-3p1d-dep8-dep16.yaml # runners/launch_gb200-nv.sh

claude · 2026-04-29T17:35:23Z

 - No extra fields besides the ones listed may be specified, or else the benchmarks will fail to run.
 - Setting the fields above, particularly `ep` and `dp-attn`, only guarantee that the respective values will be passed as environment variables to the benchmark scripts! Actually using those environment variables is an implementation detail at the level of the benchmark Bash script.

+## Multi-node srt-slurm recipes


🔴 All 12 trtllm/b300-fp8/*/disagg/mtp/*.yaml recipes declare model.precision: "fp4" on line 6 even though they live under b300-fp8/ with model.path: "dsr1-fp8" (FP8 weights). The 13 sibling STP recipes in the same b300-fp8 tree correctly declare "fp8", so this is a uniform copy-paste error from the b300-fp4 MTP templates. Fix: change precision: "fp4" → "fp8" on line 6 of every file under benchmarks/multi_node/srt-slurm-recipes/dsr1/trtllm/b300-fp8/{1k1k,8k1k}/disagg/mtp/.

Extended reasoning...

What the bug is

Every recipe under benchmarks/multi_node/srt-slurm-recipes/dsr1/trtllm/b300-fp8/*/disagg/mtp/ (12 files total — 6 in 1k1k/, 6 in 8k1k/) declares model.precision: "fp4" on line 6, while at the same time setting model.path: "dsr1-fp8" (FP8 weights) and being placed under the directory b300-fp8/. Their 13 sibling STP recipes in the same b300-fp8/ tree all correctly declare precision: "fp8", and the corresponding b300-fp4/ MTP recipes correctly declare "fp4". Pattern-wise, this is a uniform copy-paste error from the b300-fp4 MTP templates that was missed when adapting them for FP8.

How it manifests

grep -rn 'precision:' benchmarks/multi_node/srt-slurm-recipes/dsr1/trtllm/b300-fp8/ shows 12/12 MTP recipes with fp4 and 13/13 STP recipes with fp8. For example, b300-fp8/1k1k/disagg/mtp/ctx1_gen1_dp8_batch256_eplb0_mtp1_3072.yaml:

model: path: "dsr1-fp8" # FP8 weights container: "dynamo-trtllm" precision: "fp4" # ← wrong, should be fp8

Step-by-step proof

The recipe filename hashing in benchmark-multinode-tmpl.yml includes ${{ env.PRECISION }} (RESULT_FILENAME=…_${PRECISION}_${FRAMEWORK}_…).

PRECISION is sourced from the precision field of the recipe / master-yaml entry, per CONFIGS.md.

For an entry that points at trtllm/b300-fp8/1k1k/disagg/mtp/ctx1_gen1_dp8_batch256_eplb0_mtp1_3072.yaml, the recipe-side precision reads "fp4", so the result filename and any tag forwarded to srtctl will say fp4 even though the container actually loads dsr1-fp8 weights.

Every dashboard / aggregator that filters or groups by precision (which is the entire point of the field per CONFIGS.md) will then mix these FP8 b300 MTP runs into FP4 buckets.

Why existing code does not prevent it

The schema validator only verifies that the recipe file path resolves on disk; it does not cross-check that model.path (e.g. dsr1-fp8) is consistent with model.precision (fp4) or with the directory the recipe lives in. Nothing else in the diff (workflow templates, bench script) inspects precision for sanity — it is purely a metadata/tagging field.

Impact

Result mis-classification across an entire framework-precision-mode subtree (12 recipes). Anything that consumes the precision tag — dashboards, sweep selection, result aggregation — will treat these b300 FP8 MTP runs as FP4. Not a hard runtime breakage, but every result generated from these recipes is mislabeled, which is more than cosmetic for a benchmarking repo whose primary output is comparable numbers across precisions.

Fix

For each of the 12 affected files, change line 6 from precision: "fp4" to precision: "fp8". A single sed -i 's/precision: "fp4"/precision: "fp8"/' benchmarks/multi_node/srt-slurm-recipes/dsr1/trtllm/b300-fp8/{1k1k,8k1k}/disagg/mtp/*.yaml would do it. Files:

b300-fp8/1k1k/disagg/mtp/ctx1_gen1_dp8_batch256_eplb0_mtp1_3072.yaml

b300-fp8/1k1k/disagg/mtp/ctx1_gen2_dep8_batch128_eplb0_mtp1_2560.yaml

b300-fp8/1k1k/disagg/mtp/ctx1_gen5_dep8_batch16_eplb0_mtp2_720.yaml

b300-fp8/1k1k/disagg/mtp/ctx1_gen8_tp8_batch16_eplb0_mtp3_160.yaml

b300-fp8/1k1k/disagg/mtp/ctx1_gen8_tp8_batch1_eplb0_mtp3_10.yaml

b300-fp8/1k1k/disagg/mtp/ctx3_gen2_dp8_batch512_eplb0_mtp1_11264.yaml

b300-fp8/8k1k/disagg/mtp/ctx1_gen1_dp8_batch8_eplb0_mtp3_72.yaml

b300-fp8/8k1k/disagg/mtp/ctx1_gen2_tp8_batch16_eplb0_mtp3_40.yaml

b300-fp8/8k1k/disagg/mtp/ctx1_gen4_tp8_batch1_eplb0_mtp3_8.yaml

b300-fp8/8k1k/disagg/mtp/ctx1_gen4_tp8_batch4_eplb0_mtp3_20.yaml

b300-fp8/8k1k/disagg/mtp/ctx2_gen1_dp8_batch16_eplb0_mtp3_144.yaml

b300-fp8/8k1k/disagg/mtp/ctx4_gen1_dp8_batch64_eplb0_mtp2_512.yaml

(Comment placed on CONFIGS.md because the individual recipe files exceed the diff-comment threshold for line-anchored review comments.)

claude · 2026-04-29T17:35:23Z

+clone_and_install_srtctl() {
+    local repo_url="https://github.com/NVIDIA/srt-slurm.git"
+    # Pinned to NVIDIA/srt-slurm@main — currently 1372a10. Includes:
+    #   * #110 nginx-rework-ulimit: gates `ulimit -n 1048576` + worker_rlimit_nofile
+    #     behind opt-in `frontend.nginx_raise_ulimit` (we don't opt in).
+    #   * #111 srun command line log demoted INFO -> DEBUG (5KB fingerprint
+    #     heredoc no longer dominates orchestrator log).
+    local ref="1372a10c493e3fd757f342d8516a5a91c30fe6ce"
+    local repo_dir="${SRT_REPO_DIR:-srt-slurm}"
+    local uv_install_dir="${UV_INSTALL_DIR:-${HOME}/.local/bin}"
+    local uv_venv_dir="${UV_VENV_DIR:-.venv}"


🔴 The new clone_and_install_srtctl helper declares local repo_dir="${SRT_REPO_DIR:-srt-slurm}" (benchmark_lib.sh:926) and never exports the resolved value, but four launchers (launch_b200-dgxc.sh:52, launch_b300-nv.sh:53, launch_h100-dgxc-slurm.sh:67, launch_h200-dgxc-slurm.sh:60) still read $SRT_REPO_DIR after the helper returns to compute SRTCTL_ROOT="${GITHUB_WORKSPACE}/${SRT_REPO_DIR}". With SRT_REPO_DIR unset in the caller, that becomes ${GITHUB_WORKSPACE}/ (trailing slash, no subdir) and gets heredoc-baked into srtslurm.yaml as srtctl_root:, pointing srtctl at the workspace root rather than the cloned srt-slurm/. Easiest fix is export SRT_REPO_DIR="$repo_dir" inside the helper; alternatives are dropping local or hardcoding the literal in each launcher (gb200/gb300 already do).

Extended reasoning...

What the bug is

Commit 89bf3e3 (runners: factor srt-slurm clone+srtctl install into benchmark_lib helper) extracts the per-launcher SRT_REPO_DIR="srt-slurm" assignment and git clone block into a new clone_and_install_srtctl helper in benchmarks/benchmark_lib.sh. Inside the helper, the resolved value is captured as a function-local:

clone_and_install_srtctl() { ... local repo_dir="${SRT_REPO_DIR:-srt-slurm}" # benchmark_lib.sh:926 ... }

local confines the binding to the function scope, and the helper never exports SRT_REPO_DIR back. Pre-89bf3e3, every launcher set SRT_REPO_DIR="srt-slurm" inline at the top before cloning — that assignment is what the launchers below were reading. A repo-wide grep for SRT_REPO_DIR shows it is set nowhere outside the helper, so every reader after the call sees an empty value.

The dangling readers

Four launchers still have the old ${SRT_REPO_DIR} read after switching to the helper:

runners/launch_b200-dgxc.sh:52 — SRTCTL_ROOT="${GITHUB_WORKSPACE}/${SRT_REPO_DIR}"

runners/launch_b300-nv.sh:53 — same

runners/launch_h100-dgxc-slurm.sh:67 — same

runners/launch_h200-dgxc-slurm.sh:60 — same

SRTCTL_ROOT is then heredoc-substituted into the generated srtslurm.yaml:

# Path to srtctl repo root (where the configs live) srtctl_root: "${SRTCTL_ROOT}"

The two gb-prefixed launchers (launch_gb200-nv.sh:147, launch_gb300-nv.sh:48) hardcode ${GITHUB_WORKSPACE}/srt-slurm and so escape the SRTCTL_ROOT regression — they only have a cosmetic empty-path in their echo "Configs available at: $SRT_REPO_DIR/" (gb200:144, gb300:45, also h100:47, h200:40).

Step-by-step proof for launch_b200-dgxc.sh

Script enters the IS_MULTINODE branch. SRT_REPO_DIR is unset in the caller env.

Line 36: UV_INSTALL_DIR="..." UV_VENV_DIR="..." clone_and_install_srtctl || exit 1. The helper does local repo_dir="${SRT_REPO_DIR:-srt-slurm}" → repo_dir="srt-slurm" inside the function only. Helper clones into ./srt-slurm, cds into it, sets up the venv. Returns 0.

Back in the caller, SRT_REPO_DIR is still unset (the local binding evaporated; nothing was exported).

Line 52: SRTCTL_ROOT="${GITHUB_WORKSPACE}/${SRT_REPO_DIR}" expands to SRTCTL_ROOT="/github/workspace/" (or wherever $GITHUB_WORKSPACE points), with empty subdir and trailing slash.

Lines 54–75: heredoc writes srtslurm.yaml with literal srtctl_root: "/github/workspace/".

srtctl apply -f $CONFIG_FILE reads srtctl_root from this YAML to locate the srt-slurm repo (configs/recipes). It now points at the workspace root, not ${GITHUB_WORKSPACE}/srt-slurm/. cwd happens to be correct because the helper cdd into the cloned repo, but the YAML pointer that downstream machinery uses is wrong.

Impact

Affects every multi-node benchmark run on b200-dgxc, b300-nv, h100-dgxc-slurm, and h200-dgxc-slurm runners — the four srt-slurm launchers that this PR refactored. gb200/gb300 are unaffected for SRTCTL_ROOT (cosmetic echo only).

Fix

Cleanest fix is in the helper itself — change local repo_dir=... to export SRT_REPO_DIR="${SRT_REPO_DIR:-srt-slurm}" (and use $SRT_REPO_DIR in the rest of the function). That preserves the helper API, restores the pre-PR contract callers depend on, and keeps the gb200/gb300 hardcoded paths working unchanged. Alternatives: have each affected launcher set SRT_REPO_DIR="srt-slurm" inline before calling the helper, or hardcode the literal in the four SRTCTL_ROOT= lines (consistent with gb200/gb300).

claude · 2026-04-29T17:35:23Z

@@ -0,0 +1,148 @@
+name: b200-fp8-mtp-low-latency-tep8-1p-1d


🟡 The MTP recipe name fields in three files don't match their actual resource topology, looking like copy-paste artifacts from sa-bench templates that weren't updated when topologies changed. Specifically: 8k1k_mtp_lowlat_0.yaml says ...-1p-1d but is 1p-3d (TOTAL_GPUS=32); 8k1k_mtp_maxtpt_0.yaml says ...-1p-1d but is 1p-2d (TOTAL_GPUS=24); 8k1k_mtp_maxtpt_1.yaml says ...-1p-2d but is 1p-1d (TOTAL_GPUS=16). The peer STP files in the sibling stp/ directory are correctly named for the same topologies, so the fix is just to align the name field with the resources block — purely a labeling/observability bug since the runtime topology is driven by the resources block.

Extended reasoning...

What's wrong

Three new MTP recipe yamls in benchmarks/multi_node/srt-slurm-recipes/dsr1/sglang/b200-fp8/8k1k/disagg/mtp/ carry top-level name: fields that are inconsistent with the topology declared in the resources: block of the same file:

File name says resources/TOTAL_GPUS imply Mismatch

8k1k_mtp_lowlat_0.yaml b200-fp8-mtp-low-latency-tep8-1p-1d prefill=1, decode=3, TOTAL_GPUS=32 → 1p-3d name claims 1d, resources have 3d

8k1k_mtp_maxtpt_0.yaml b200-fp8-mtp-max-tpt-dep8-1p-1d prefill=1, decode=2, TOTAL_GPUS=24 → 1p-2d name claims 1d, resources have 2d

8k1k_mtp_maxtpt_1.yaml b200-fp8-mtp-max-tpt-dep8-1p-2d prefill=1, decode=1, TOTAL_GPUS=16 → 1p-1d name claims 2d, resources have 1d

Why the resources block is the source of truth

The peer STP recipes in the sibling stp/ directory (8k1k_stp_lowlat_0.yaml, 8k1k_stp_maxtpt_0.yaml, 8k1k_stp_maxtpt_1.yaml) carry correctly matched names for the identical topologies — ...-1p-3d, ...-1p-2d, ...-1p-1d respectively. This isolates the issue to the MTP side and rules out an alternative naming convention. Cross-checking against the parent 1k1k recipe (b200-fp8/1k1k/disagg/1k1k.yaml's zip_override_mtp_lowlat/zip_override_mtp_maxtpt sections) further confirms the convention: 1p-Nd always tracks the number of decode workers.

Step-by-step proof for 8k1k_mtp_lowlat_0.yaml

Line 1: name: b200-fp8-mtp-low-latency-tep8-1p-1d — name suffix claims 1 decode node.

resources.prefill_nodes: 1, resources.prefill_workers: 1 → 1 prefill worker × 8 GPUs/node = 8 prefill GPUs.

resources.decode_nodes: 3, resources.decode_workers: 3 → 3 decode workers × 8 GPUs/node = 24 decode GPUs.

Total: 8 + 24 = 32 GPUs, matching TOTAL_GPUS: "32" in benchmark.env.

The peer STP file at the same scale (8k1k_stp_lowlat_0.yaml) has identical decode_nodes/decode_workers: 3 and TOTAL_GPUS: "32", with name b200-fp8-stp-low-latency-tp8-1p-3d — clearly the right convention.

Therefore the MTP file's -1p-1d suffix is a typo for -1p-3d. The same logic flips for maxtpt_0 (should be -1p-2d) and maxtpt_1 (should be -1p-1d) — the two maxtpt names look swapped.

Impact

The name field is used by srtctl to generate the SLURM job name and by downstream observability/result-mapping tooling (logs, dashboards). Benchmark execution itself is unaffected because the actual topology comes from resources.* and the result filename uses PREFILL_GPUS/DECODE_GPUS/TOTAL_GPUS, all of which are correct. The damage is scoped to humans interpreting SLURM/log output: runs labeled ...-1p-1d will silently appear as 1p-2d or 1p-3d in dashboards, complicating result attribution across the maxtpt scale sweep where the 0/1 names are also swapped relative to each other.

Fix

Update the three name: fields to match the resource topology:

8k1k_mtp_lowlat_0.yaml line 1 → name: b200-fp8-mtp-low-latency-tep8-1p-3d

8k1k_mtp_maxtpt_0.yaml line 1 → name: b200-fp8-mtp-max-tpt-dep8-1p-2d

8k1k_mtp_maxtpt_1.yaml line 1 → name: b200-fp8-mtp-max-tpt-dep8-1p-1d

No other changes are needed; the resources, TOTAL_GPUS, and benchmark.env blocks are already self-consistent.

cquil11 and others added 3 commits April 28, 2026 09:50

github-project-automation Bot added this to InferenceMAX Board Apr 28, 2026

cquil11 and others added 13 commits April 28, 2026 10:54

cquil11 marked this pull request as ready for review April 29, 2026 17:20

cquil11 requested a review from a team April 29, 2026 17:20

cquil11 requested review from jgangani and kedarpotdar-nv as code owners April 29, 2026 17:20

cquil11 changed the title ~~Chore/upstream srt slurm~~ chore: upstream srt-slurm recipes + first-class recipe field + custom-bench wrapper Apr 29, 2026

claude Bot reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: upstream srt-slurm recipes + first-class recipe field + custom-bench wrapper#1211

chore: upstream srt-slurm recipes + first-class recipe field + custom-bench wrapper#1211
cquil11 wants to merge 17 commits intomainfrom
chore/upstream-srt-slurm

cquil11 commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

File	`name` says	resources/TOTAL_GPUS imply	Mismatch
`8k1k_mtp_lowlat_0.yaml`	`b200-fp8-mtp-low-latency-tep8-1p-1d`	prefill=1, decode=3, TOTAL_GPUS=32 → 1p-3d	name claims 1d, resources have 3d
`8k1k_mtp_maxtpt_0.yaml`	`b200-fp8-mtp-max-tpt-dep8-1p-1d`	prefill=1, decode=2, TOTAL_GPUS=24 → 1p-2d	name claims 1d, resources have 2d
`8k1k_mtp_maxtpt_1.yaml`	`b200-fp8-mtp-max-tpt-dep8-1p-2d`	prefill=1, decode=1, TOTAL_GPUS=16 → 1p-1d	name claims 2d, resources have 1d

Conversation

cquil11 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

How env reaches the bench container

Validation

Out of scope (follow-ups)

Upstream srt-slurm contributions made along the way

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

What the bug is

How it manifests

Step-by-step proof

Why existing code does not prevent it

Impact

Fix

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

What the bug is

The dangling readers

Step-by-step proof for launch_b200-dgxc.sh

Impact

Fix

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

What's wrong

Why the resources block is the source of truth

Step-by-step proof for 8k1k_mtp_lowlat_0.yaml

Impact

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cquil11 commented Apr 28, 2026 •

edited

Loading

Step-by-step proof for `8k1k_mtp_lowlat_0.yaml`