[NV] H100 FP8 Disagg DSR1 1k1k, 8k1k (STP + MTP) by nlevin-ui · Pull Request #651 · SemiAnalysisAI/InferenceX

nlevin-ui · 2026-02-05T18:25:18Z

Update H100 section with verified Pareto-optimal points:

1k1k ISL/OSL:

MTP: 9 configs (conc: 6, 9, 30, 60, 117, 231, 462, 615, 1229)
STP: 9 configs (conc: 6, 9, 30, 60, 231, 462, 924, 1845, 4916)

8k1k ISL/OSL (new):

MTP: 6 configs (conc: 6, 9, 30, 77, 78, 154)
STP: 5 configs (conc: 6, 9, 30, 154, 308)

All configs use TP=16, EP=16 across 2 nodes (16 GPUs total).

functionstackx · 2026-02-05T19:11:53Z

@claude review

Klaud-Cold · 2026-02-05T19:12:15Z

Claude finished @functionstackx's task in 2m 51s —— View job

Review: PR #651 — H100 FP8 Disagg DSR1 Configs

Gather context and read changed files
Review YAML structure and consistency
Check configuration values and parameters
Post review feedback

Summary

This PR adds a new dsr1-fp8-h100-dynamo-trt config key with 29 Pareto-optimal disaggregated inference configurations for H100 multinode (TP=16, EP=16 across 2 nodes). The structure mirrors the existing dsr1-fp8-h200-dynamo-trt config.

Findings

Issue: Concurrency ordering is non-monotonic in 1k1k MTP section

The 1k1k MTP configs have conc=615 before conc=462 (lines 1664-1693), breaking the ascending order used everywhere else in this file and in the H200 reference config.

Current order: 6, 9, 30, 60, 117, 231, 615, 462, 1229
Expected order: 6, 9, 30, 60, 117, 231, 462, 615, 1229

The PR description lists them in the correct sorted order (462, 615), so this appears to be a copy/paste ordering issue in the YAML. While it may not affect functionality (configs are likely processed independently), it hurts readability and deviates from the consistent ascending pattern used in every other section.

Fix this →

Note: Missing `perf-changelog.yaml` entry

This PR adds an entirely new config key (dsr1-fp8-h100-dynamo-trt) but does not add a corresponding entry to perf-changelog.yaml. Per repo conventions, new config keys that affect performance benchmarks should be documented there. An entry like this would be appropriate:

- config-keys:
    - dsr1-fp8-h100-dynamo-trt
  description:
    - "Add H100 FP8 Disagg DSR1 Dynamo TRT-LLM multinode configurations"
    - "Image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post1"
    - "1k1k: 18 scenarios (9 MTP, 9 STP) with verified Pareto-optimal points"
    - "8k1k: 11 scenarios (6 MTP, 5 STP) for long context workloads"
    - "TP=16, EP=16 across 2 nodes (16 GPUs total)"
  pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/651

Fix this →

Cross-validation Results (all pass)

Check	Result
YAML validity	Valid
Config count matches PR description (9+9+6+5=29)	Pass
CONFIG_FILE paths match ISL/OSL (`1k1k`/`8k1k`)	Pass
CONFIG_FILE paths match MTP/STP (`mtp`/`stp` subdirectories, `mtp3` vs `mtp0`)	Pass
`ctx{N}` in filenames matches `prefill.num-worker`	Pass
`gen{N}` in filenames matches `decode.num-worker`	Pass
`tep`/`dep` in filenames matches `decode.dp-attn`	Pass
TP size in filenames matches config `tp: 16`	Pass
Image tag consistent with H200 config (`0.8.1.post1`)	Pass
`multinode: true` and `disagg: true` present	Pass
Runner set to `h100-multinode-slurm`	Pass

Architecture observations

Worker ratios vary sensibly with concurrency: low-conc uses 1P+3D (prefill-light), high-conc shifts to 2P+1D or 1P+1D (decode-light)
dp-attn transitions from false to true on decode workers as concurrency increases and topology shifts from TEP to DEP — consistent with the recipe naming
8k1k MTP conc=77 and conc=78 are distinct Pareto points with different topologies (1P+1D DEP vs 1P+2D TEP) despite near-identical concurrency values — this is correct

| Branch: nv/h100-disagg-trtllm-fp8

functionstackx · 2026-02-05T20:17:05Z

hi @nlevin-ui @kedarpotdar-nv

i am removing sweep-enabled since the PR isnt fixed yet. feel free to readd the tag when it is ready

functionstackx · 2026-02-05T21:09:57Z

@pr-claude review plz

claude · 2026-02-05T21:11:13Z

Claude finished @functionstackx's task in 2m 34s —— View job

Review: PR #651 — H100 FP8 Disagg DSR1 Configs

Gather context and read changed files
Validate YAML structure and config keys
Validate container images and launch script
Check perf-changelog.yaml
Post review feedback

Summary

Found 2 blocking bugs in the launch script that will prevent any benchmark from running, plus 1 warning and 1 missing changelog entry.

Blocking Issues

#	Severity	File	Issue
1	🔴 BLOCKING	`runners/launch_h100-dgxc-slurm.sh:44`	`MODEL_PREFIX` check compares against `"DeepSeek-R1-0528"` but the workflow passes `"dsr1"` (from `model-prefix: dsr1`). Every job will exit immediately. All other launch scripts check the short prefix.
2	🔴 BLOCKING	`runners/launch_h100-dgxc-slurm.sh:39`	Hardcoded squash file path (`dynamo-trtllm-mnnvl-fix.sqsh`) — not derived from IMAGE and no `enroot import docker://` step. Not reproducible outside the specific cluster.
3	🔴 BLOCKING	`perf-changelog.yaml`	`nvidia-master.yaml` adds a new config key (`dsr1-fp8-h100-dynamo-trt`) but `perf-changelog.yaml` was not updated. Per repo conventions, master config changes require a changelog entry.

Warnings

#	Severity	File	Issue
4	🟡 WARNING	`runners/launch_h100-dgxc-slurm.sh:14`	Checks out `trtllm-h100` branch but all YAML recipe comments reference `sa-submission-q1-2026` branch — mismatch needs clarification
5	🟡 WARNING	`.github/configs/nvidia-master.yaml:1664-1693`	1k1k MTP concurrency ordering is non-monotonic (`615` before `462`) — already flagged in previous review

What looks good

Config key dsr1-fp8-h100-dynamo-trt uses valid model prefix (dsr1)
Image nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post1 is publicly accessible on NGC
Config count matches PR description (9+9+6+5 = 29)
Worker ratios and topology transitions are sensible across concurrency levels
YAML structure mirrors the existing H200 config correctly

View job | Branch: nv/h100-disagg-trtllm-fp8

Update H100 section with verified Pareto-optimal points: 1k1k ISL/OSL: - MTP: 9 configs (conc: 6, 9, 30, 60, 117, 231, 462, 615, 1229) - STP: 9 configs (conc: 6, 9, 30, 60, 231, 462, 924, 1845, 4916) 8k1k ISL/OSL (new): - MTP: 6 configs (conc: 6, 9, 30, 77, 78, 154) - STP: 5 configs (conc: 6, 9, 30, 154, 308) All configs use TP=16, EP=16 across 2 nodes (16 GPUs total).

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Use consistent sed-based path derivation for all container images.

Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Only dsr1-fp8-h100-dynamo-trt should use 0.8.1.post3. Revert B200, B300, and H200 configs back to 0.8.1.post1. Co-authored-by: Cursor <cursoragent@cursor.com>

Documents the new H100 FP8 disaggregated TRT-LLM configurations using tensorrtllm-runtime:0.8.1.post3 container. Co-authored-by: Cursor <cursoragent@cursor.com>

claude · 2026-02-06T22:31:22Z

Claude finished @nlevin-ui's task in 5m 7s —— View job

Re-review: PR #651 — H100 FP8 Disagg DSR1 Configs

Gather context and read changed files
Check previous review findings against current code
Validate launch script and configs
Post review feedback

Previous issues resolved

Concurrency ordering (462/615) — Fixed. Now in correct ascending order.
Hardcoded squash file — Fixed. SQUASH_FILE is now derived from IMAGE.
Branch mismatch — Fixed. H100 script now checks out sa-submission-q1-2026, matching YAML comments.
Missing perf-changelog entry — Fixed. Entry added for dsr1-fp8-h100-dynamo-trt.

Remaining issues

#	Severity	File	Issue
1	🔴 BLOCKING	`nvidia-master.yaml:1577`	`model-prefix: DeepSeek-R1-0528` — invalid model prefix. Every other DSR1 config uses `dsr1`. The frontend only supports `dsr1` and `gptoss` prefixes. Results will not display.
2	🔴 BLOCKING	`launch_h100-dgxc-slurm.sh:40`	`MODEL_PREFIX` check matches `"DeepSeek-R1-0528"` — must be updated to `"dsr1"` after fixing issue #1, otherwise all jobs exit immediately.
3	🟡 WARNING	`launch_h100-dgxc-slurm.sh:38`	No `enroot import docker://$IMAGE` step. The B200 SLURM script includes this; the H200 SLURM script also omits it. May be a cluster-specific difference but reduces reproducibility.

Issues #1 and #2 are coupled — the YAML model-prefix and launch script check must agree. The fix is: change model-prefix to dsr1 in the YAML, and change the launch script check to "dsr1".

| Branch: nv/h100-disagg-trtllm-fp8

claude · 2026-02-06T22:36:22Z

+dsr1-fp8-h100-dynamo-trt:
+  image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post3
+  model: deepseek-ai/DeepSeek-R1-0528
+  model-prefix: DeepSeek-R1-0528


🔴 BLOCKING: Invalid model prefix DeepSeek-R1-0528. The frontend only supports dsr1 and gptoss as model prefixes. Every other DeepSeek config in this file uses model-prefix: dsr1 (e.g., dsr1-fp8-h200-dynamo-trt at line 1092). Using DeepSeek-R1-0528 will prevent the frontend from displaying benchmark results for this config.

Why it matters: Results will be collected but won't appear on the frontend dashboard, making this config effectively invisible to users.

Fix: Change the model-prefix to dsr1 (matching H200 and all other DSR1 configs), and update the launch script's MODEL_PREFIX check accordingly:

Suggested change

model-prefix: DeepSeek-R1-0528

model-prefix: dsr1

claude · 2026-02-06T22:36:27Z

+if [[ $MODEL_PREFIX == "DeepSeek-R1-0528" ]]; then
+    export MODEL_PATH="/mnt/numa1/shared/models/dsr1-fp8"
+    export SERVED_MODEL_NAME="DeepSeek-R1-0528"
+else
+    echo "Unsupported model prefix: $MODEL_PREFIX. Supported prefixes are: DeepSeek-R1-0528"


🔴 BLOCKING: Must check for "dsr1" (not "DeepSeek-R1-0528") to be consistent with the corrected model-prefix in the YAML config and match the convention used by all other launch scripts (launch_h200-dgxc-slurm.sh, launch_b300-nv.sh, etc.).

Why it matters: After fixing the YAML model-prefix to dsr1, this check will fail and every job will exit with error code 1.

Fix:

Suggested change

if [[ $MODEL_PREFIX == "DeepSeek-R1-0528" ]]; then

export MODEL_PATH="/mnt/numa1/shared/models/dsr1-fp8"

export SERVED_MODEL_NAME="DeepSeek-R1-0528"

else

echo "Unsupported model prefix: $MODEL_PREFIX. Supported prefixes are: DeepSeek-R1-0528"

if [[ $MODEL_PREFIX == "dsr1" ]]; then

export MODEL_PATH="/mnt/numa1/shared/models/dsr1-fp8"

export SERVED_MODEL_NAME="DeepSeek-R1-0528"

else

echo "Unsupported model prefix: $MODEL_PREFIX. Supported prefixes are: dsr1"

Change SQUASH_FILE path from /mnt/nfs/slurm-shared/containers/ to /mnt/nfs/sa-shared/containers/ to match cluster configuration. Co-authored-by: Cursor <cursoragent@cursor.com>

functionstackx

LGTM, feel free to merge if the single sweep error is transient

…trt changelog to end Co-authored-by: Cursor <cursoragent@cursor.com>

nlevin-ui self-assigned this Feb 5, 2026

nlevin-ui added NVIDIA sweep-enabled labels Feb 5, 2026

github-project-automation Bot added this to InferenceMAX Board Feb 5, 2026

nlevin-ui requested review from csahithi and kedarpotdar-nv February 5, 2026 18:33

Klaud-Cold reviewed Feb 5, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml Outdated

functionstackx removed the sweep-enabled label Feb 5, 2026

claude Bot reviewed Feb 5, 2026

View reviewed changes

Comment thread runners/launch_h100-dgxc-slurm.sh

claude Bot reviewed Feb 5, 2026

View reviewed changes

Comment thread runners/launch_h100-dgxc-slurm.sh Outdated

claude Bot reviewed Feb 5, 2026

View reviewed changes

Comment thread runners/launch_h100-dgxc-slurm.sh Outdated

claude Bot reviewed Feb 5, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml Outdated

nlevin-ui and others added 9 commits February 6, 2026 22:21

add h100 multinode launch script

c4103f5

modify model prefix

46456c0

Update .github/configs/nvidia-master.yaml

1fc4cd7

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Remove hardcoded container override in H100 launch script

c42032e

Use consistent sed-based path derivation for all container images.

Update H100 image to tensorrtllm-runtime:0.8.1.post3

7cf62e5

Co-authored-by: Cursor <cursoragent@cursor.com>

Update runners/launch_h100-dgxc-slurm.sh

af07364

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

fix: revert container to post1 for non-H100 configs

aa3af24

Only dsr1-fp8-h100-dynamo-trt should use 0.8.1.post3. Revert B200, B300, and H200 configs back to 0.8.1.post1. Co-authored-by: Cursor <cursoragent@cursor.com>

Add perf-changelog entry for dsr1-fp8-h100-dynamo-trt

3911118

Documents the new H100 FP8 disaggregated TRT-LLM configurations using tensorrtllm-runtime:0.8.1.post3 container. Co-authored-by: Cursor <cursoragent@cursor.com>

nlevin-ui force-pushed the nv/h100-disagg-trtllm-fp8 branch from b7a3532 to 3911118 Compare February 6, 2026 22:22

nlevin-ui marked this pull request as ready for review February 6, 2026 22:31

nlevin-ui requested a review from a team as a code owner February 6, 2026 22:31

nlevin-ui added the sweep-enabled label Feb 6, 2026

nlevin-ui requested a review from Klaud-Cold February 6, 2026 22:32

claude Bot reviewed Feb 6, 2026

View reviewed changes

Comment thread runners/launch_h100-dgxc-slurm.sh Outdated

nlevin-ui removed the sweep-enabled label Feb 6, 2026

nlevin-ui and others added 2 commits February 6, 2026 15:48

Merge branch 'main' into nv/h100-disagg-trtllm-fp8

610458c

fix: use sa-shared container path in H100 launch script

d496e10

Change SQUASH_FILE path from /mnt/nfs/slurm-shared/containers/ to /mnt/nfs/sa-shared/containers/ to match cluster configuration. Co-authored-by: Cursor <cursoragent@cursor.com>

nlevin-ui added the sweep-enabled label Feb 6, 2026

functionstackx approved these changes Feb 7, 2026

View reviewed changes

kedarpotdar-nv changed the title ~~[WIP - DRAFT ] [NV] H100 FP8 Disagg DSR1 1k1k, 8k1k (STP + MTP)~~ [NV] H100 FP8 Disagg DSR1 1k1k, 8k1k (STP + MTP) Feb 7, 2026

Merge main into nv/h100-disagg-trtllm-fp8; move dsr1-fp8-h100-dynamo-…

8920277

…trt changelog to end Co-authored-by: Cursor <cursoragent@cursor.com>

kedarpotdar-nv approved these changes Feb 7, 2026

View reviewed changes

kedarpotdar-nv merged commit 32a4845 into main Feb 7, 2026
10 of 40 checks passed

kedarpotdar-nv deleted the nv/h100-disagg-trtllm-fp8 branch February 7, 2026 22:28

github-project-automation Bot moved this to Done in InferenceMAX Board Feb 7, 2026

functionstackx restored the nv/h100-disagg-trtllm-fp8 branch February 8, 2026 00:45

functionstackx mentioned this pull request Feb 8, 2026

[NVIDIA] fix: h100 dsr1 fp8 dynamo trtllm model prefix #663

Merged

functionstackx deleted the nv/h100-disagg-trtllm-fp8 branch February 18, 2026 04:16

claude Bot mentioned this pull request Apr 23, 2026

trigger H100 multinode evals #1120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] H100 FP8 Disagg DSR1 1k1k, 8k1k (STP + MTP)#651

[NV] H100 FP8 Disagg DSR1 1k1k, 8k1k (STP + MTP)#651
kedarpotdar-nv merged 12 commits intomainfrom
nv/h100-disagg-trtllm-fp8

nlevin-ui commented Feb 5, 2026

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

claude Bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

claude Bot Feb 6, 2026

Uh oh!

claude Bot Feb 6, 2026

Uh oh!

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

nlevin-ui commented Feb 5, 2026

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: PR #651 — H100 FP8 Disagg DSR1 Configs

Summary

Findings

Issue: Concurrency ordering is non-monotonic in 1k1k MTP section

Note: Missing perf-changelog.yaml entry

Cross-validation Results (all pass)

Architecture observations

Uh oh!

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

claude Bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: PR #651 — H100 FP8 Disagg DSR1 Configs

Summary

Blocking Issues

Warnings

What looks good

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: PR #651 — H100 FP8 Disagg DSR1 Configs

Previous issues resolved

Remaining issues

Uh oh!

claude Bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Klaud-Cold commented Feb 5, 2026 •

edited

Loading

Note: Missing `perf-changelog.yaml` entry

claude Bot commented Feb 5, 2026 •

edited

Loading

claude Bot commented Feb 6, 2026 •

edited

Loading