Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
80d9808
srt-slurm: upstream recipes and add first-class `recipe:` field
cquil11 Apr 28, 2026
89bf3e3
runners: factor srt-slurm clone+srtctl install into benchmark_lib helper
cquil11 Apr 28, 2026
d29d066
runners: factor image-filename sanitization into benchmark_lib helper
cquil11 Apr 28, 2026
77de857
srt-slurm: reorganize recipes by model/framework/hw/seq-len/topology
cquil11 Apr 28, 2026
aa430b5
srt-slurm: collapse split <isl>/<osl>/ recipe dirs into <isl><osl>/
cquil11 Apr 28, 2026
1ca0696
runners: pin all srt-slurm clones to NVIDIA/srt-slurm@52e697d5
cquil11 Apr 28, 2026
0f755d2
runners: hardcode srt-slurm pin in benchmark_lib helper
cquil11 Apr 28, 2026
6f99d48
srt-slurm: wire custom-script bench, drop sa-bench dependency (proof-…
cquil11 Apr 28, 2026
290fcb6
srt-slurm: simplify custom-bench plumbing — drop redundant recipe env
cquil11 Apr 28, 2026
adf8a11
srt-slurm: keep run_benchmark_serving pass-throughs to just --tokeniz…
cquil11 Apr 28, 2026
baf8e28
srt-slurm: compress recipe-resolution block in benchmark template
cquil11 Apr 28, 2026
d3e9b93
runners: roll srt-slurm pin back one commit to dodge nginx ulimit reg…
cquil11 Apr 28, 2026
1241086
runners: bump srt-slurm pin to ishan-rework-nginx (425b486)
cquil11 Apr 28, 2026
fecd2de
srt-slurm: default bench backend to `openai`, drop hardcoded /v1/comp…
cquil11 Apr 28, 2026
24d118f
runners: bump srt-slurm pin to NVIDIA/main@1372a10
cquil11 Apr 28, 2026
792d8aa
srt-slurm: migrate remaining 364 recipes from sa-bench → custom
cquil11 Apr 28, 2026
d7dc72f
Merge remote-tracking branch 'origin/main' into chore/upstream-srt-slurm
cquil11 Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
52 changes: 52 additions & 0 deletions .github/configs/CONFIGS.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,58 @@ Notes:
- No extra fields besides the ones listed may be specified, or else the benchmarks will fail to run.
- Setting the fields above, particularly `ep` and `dp-attn`, only guarantee that the respective values will be passed as environment variables to the benchmark scripts! Actually using those environment variables is an implementation detail at the level of the benchmark Bash script.

## Multi-node srt-slurm recipes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 All 12 trtllm/b300-fp8/*/disagg/mtp/*.yaml recipes declare model.precision: "fp4" on line 6 even though they live under b300-fp8/ with model.path: "dsr1-fp8" (FP8 weights). The 13 sibling STP recipes in the same b300-fp8 tree correctly declare "fp8", so this is a uniform copy-paste error from the b300-fp4 MTP templates. Fix: change precision: "fp4""fp8" on line 6 of every file under benchmarks/multi_node/srt-slurm-recipes/dsr1/trtllm/b300-fp8/{1k1k,8k1k}/disagg/mtp/.

Extended reasoning...

What the bug is

Every recipe under benchmarks/multi_node/srt-slurm-recipes/dsr1/trtllm/b300-fp8/*/disagg/mtp/ (12 files total — 6 in 1k1k/, 6 in 8k1k/) declares model.precision: "fp4" on line 6, while at the same time setting model.path: "dsr1-fp8" (FP8 weights) and being placed under the directory b300-fp8/. Their 13 sibling STP recipes in the same b300-fp8/ tree all correctly declare precision: "fp8", and the corresponding b300-fp4/ MTP recipes correctly declare "fp4". Pattern-wise, this is a uniform copy-paste error from the b300-fp4 MTP templates that was missed when adapting them for FP8.

How it manifests

grep -rn 'precision:' benchmarks/multi_node/srt-slurm-recipes/dsr1/trtllm/b300-fp8/ shows 12/12 MTP recipes with fp4 and 13/13 STP recipes with fp8. For example, b300-fp8/1k1k/disagg/mtp/ctx1_gen1_dp8_batch256_eplb0_mtp1_3072.yaml:

model:
  path: "dsr1-fp8"          # FP8 weights
  container: "dynamo-trtllm"
  precision: "fp4"          # ← wrong, should be fp8

Step-by-step proof

  1. The recipe filename hashing in benchmark-multinode-tmpl.yml includes ${{ env.PRECISION }} (RESULT_FILENAME=…_${PRECISION}_${FRAMEWORK}_…).
  2. PRECISION is sourced from the precision field of the recipe / master-yaml entry, per CONFIGS.md.
  3. For an entry that points at trtllm/b300-fp8/1k1k/disagg/mtp/ctx1_gen1_dp8_batch256_eplb0_mtp1_3072.yaml, the recipe-side precision reads "fp4", so the result filename and any tag forwarded to srtctl will say fp4 even though the container actually loads dsr1-fp8 weights.
  4. Every dashboard / aggregator that filters or groups by precision (which is the entire point of the field per CONFIGS.md) will then mix these FP8 b300 MTP runs into FP4 buckets.

Why existing code does not prevent it

The schema validator only verifies that the recipe file path resolves on disk; it does not cross-check that model.path (e.g. dsr1-fp8) is consistent with model.precision (fp4) or with the directory the recipe lives in. Nothing else in the diff (workflow templates, bench script) inspects precision for sanity — it is purely a metadata/tagging field.

Impact

Result mis-classification across an entire framework-precision-mode subtree (12 recipes). Anything that consumes the precision tag — dashboards, sweep selection, result aggregation — will treat these b300 FP8 MTP runs as FP4. Not a hard runtime breakage, but every result generated from these recipes is mislabeled, which is more than cosmetic for a benchmarking repo whose primary output is comparable numbers across precisions.

Fix

For each of the 12 affected files, change line 6 from precision: "fp4" to precision: "fp8". A single sed -i 's/precision: "fp4"/precision: "fp8"/' benchmarks/multi_node/srt-slurm-recipes/dsr1/trtllm/b300-fp8/{1k1k,8k1k}/disagg/mtp/*.yaml would do it. Files:

  • b300-fp8/1k1k/disagg/mtp/ctx1_gen1_dp8_batch256_eplb0_mtp1_3072.yaml
  • b300-fp8/1k1k/disagg/mtp/ctx1_gen2_dep8_batch128_eplb0_mtp1_2560.yaml
  • b300-fp8/1k1k/disagg/mtp/ctx1_gen5_dep8_batch16_eplb0_mtp2_720.yaml
  • b300-fp8/1k1k/disagg/mtp/ctx1_gen8_tp8_batch16_eplb0_mtp3_160.yaml
  • b300-fp8/1k1k/disagg/mtp/ctx1_gen8_tp8_batch1_eplb0_mtp3_10.yaml
  • b300-fp8/1k1k/disagg/mtp/ctx3_gen2_dp8_batch512_eplb0_mtp1_11264.yaml
  • b300-fp8/8k1k/disagg/mtp/ctx1_gen1_dp8_batch8_eplb0_mtp3_72.yaml
  • b300-fp8/8k1k/disagg/mtp/ctx1_gen2_tp8_batch16_eplb0_mtp3_40.yaml
  • b300-fp8/8k1k/disagg/mtp/ctx1_gen4_tp8_batch1_eplb0_mtp3_8.yaml
  • b300-fp8/8k1k/disagg/mtp/ctx1_gen4_tp8_batch4_eplb0_mtp3_20.yaml
  • b300-fp8/8k1k/disagg/mtp/ctx2_gen1_dp8_batch16_eplb0_mtp3_144.yaml
  • b300-fp8/8k1k/disagg/mtp/ctx4_gen1_dp8_batch64_eplb0_mtp2_512.yaml

(Comment placed on CONFIGS.md because the individual recipe files exceed the diff-comment threshold for line-anchored review comments.)


Multi-node configs that dispatch via `srt-slurm` (i.e. `srtctl apply -f …`) reference their recipe as a first-class field on the search-space entry:

```yaml
search-space:
- spec-decoding: "mtp"
conc-list: [1214]
recipe: "trtllm/b200-fp4/1k1k/mtp/ctx1_gen2_dep8_batch64_eplb0_mtp2.yaml"
prefill:
num-worker: 1
tp: 4
ep: 4
dp-attn: true
decode:
num-worker: 2
tp: 8
ep: 8
dp-attn: true
```

- `recipe` is a path **relative to `benchmarks/multi_node/srt-slurm-recipes/`** in this repo. The schema validator rejects entries whose recipe file does not exist on disk, so adding a new entry requires upstreaming the recipe yaml here first.
- The path may carry an `:override[N]` / `:override_<name>` suffix to select a named override section inside an sglang-style recipe yaml (e.g. `"dsr1/sglang/b200-fp4/1k1k/disagg/1k1k.yaml:zip_override_mtp_lowlat[0]"`). The launcher strips this suffix before reading the file but passes the full string to `srtctl`.
- `recipe` is optional: multi-node entries that do *not* go through srt-slurm (e.g. dynamo-sglang aggregated topologies that drive their own bash) leave it unset.
- Recipes live under `benchmarks/multi_node/srt-slurm-recipes/` organized as `<model>/<framework>/<hw>-<precision>/<isl><osl>/<agg|disagg>/<stp|mtp>/<recipe-name>.yaml` — e.g. `dsr1/trtllm/b200-fp4/1k1k/disagg/mtp/ctx1_gen2_dep8_batch64_eplb0_mtp2.yaml`. A handful of sglang-style files that carry override sections spanning both stp and mtp are parked one level shallower (the trailing `<stp|mtp>/` segment is omitted). The benchmark template resolves `recipe` to an absolute path and passes it to the launcher as `CONFIG_FILE`, so launchers do not see the relative form.

### Custom-script benchmarking

Recipes are migrating from srt-slurm's bundled `benchmark.type: sa-bench` to `benchmark.type: custom` so the benchmark client lives in this repo (`utils/bench_serving/benchmark_serving.py`) instead of being maintained twice. New shape:

```yaml
container_mounts:
"$INFMAX_WORKSPACE": "/infmax-workspace"

benchmark:
type: "custom"
command: "bash /infmax-workspace/benchmarks/multi_node/srt_bench.sh"
env:
PREFILL_GPUS: "4" # per prefill worker (filename component)
DECODE_GPUS: "8" # per decode worker (filename component)
TOTAL_GPUS: "20" # sum across workers (filename component)
# MODEL_NAME: "..." # only when server's served-model-name
# differs from master-yaml's `model:`
# USE_CHAT_TEMPLATE: "false" # only when overriding default (true)
```

`MODEL`, `ISL`, `OSL`, `CONC_LIST`, `DISAGG`, `RANDOM_RANGE_RATIO` are exported by `benchmark-multinode-tmpl.yml` at the workflow step and propagate through the launcher → `srtctl` → `srun` (default `--export=ALL`) → pyxis into the benchmark container, so they don't need to be re-declared in `benchmark.env`. The recipe only carries per-recipe topology knobs (`PREFILL_GPUS`/`DECODE_GPUS`/`TOTAL_GPUS`, used in the result filename) plus the rare overrides (`MODEL_NAME` when the server's served-model-name diverges from `model:`, `USE_CHAT_TEMPLATE: false` for tokenizers that have no chat template, etc.).

`benchmarks/multi_node/srt_bench.sh` is a thin wrapper around `run_benchmark_serving()` in `benchmarks/benchmark_lib.sh` (the same shim every single-node bench script uses). It loops once per concurrency in `$CONC_LIST` and writes results to `/logs/sa-bench_isl_<ISL>_osl_<OSL>/results_concurrency_<N>_gpus_<TOT>_ctx_<P>_gen_<D>.json` so existing launcher result-harvesters pick them up unchanged. Tokenizer is loaded from `/model` — `srtctl`'s `RuntimeContext.create` auto-mounts the model dir at that path in every container, so we don't need any HF Hub egress.

The `container_mounts` block bind-mounts the host-side `$INFMAX_WORKSPACE` (set by the launcher to `$GITHUB_WORKSPACE`) at `/infmax-workspace` inside srt-slurm's benchmark container, so the wrapper and bench client are reachable at known paths. `srtctl` resolves `$INFMAX_WORKSPACE` via `os.path.expandvars` at submission time.

## Runners

The `runners.yaml` config represents the available runners in the repository. The keys are the runner *types* (i.e., the GPUs as well as some specific combinations like `b200-trt`) whereas the value is a list of *runner nodes*. This config is used to verify the master configs.
Loading
Loading