[NVIDIA] Add TRT 70B (FP8 and FP4) by kedarpotdar-nv · Pull Request #2 · SemiAnalysisAI/InferenceX

kedarpotdar-nv · 2025-08-29T21:17:20Z

following up from #1.

Made these changes:

added 70B TRT LLM config to 70b-tmpl.yml
added new var for precision - fp8 or fp4
ensured collect and plot scripts are reflective of trt and vllm

See workflow run # 134 which had sanity test.

Thanks!

…summarize.py to reflect backend, fix issue with result filename

…arily disable most configs to sanity test

Update upstream sync for all branches

kedarpotdar-nv added 30 commits August 28, 2025 09:58

add trt init for 70b

d556b88

remove dsr1 and add $MAX_MODEL_LEN to launch configs

426f48e

remove b200 tg

12a7f6e

add RUNNER LABEL and temporarily remove bmk-b200?

0fc8ab4

fix per kimbo's suggestion

4b30c03

revert local runner var

aab2320

update sqsh file name to include runner name. i.e. trt

0c5ad16

temporarily remove other benchmarks. only keep bmk-b200-trt

7487baa

refactor scheduler to add trt tag, update ngc image address , update …

1233b53

…summarize.py to reflect backend, fix issue with result filename

refactor trt into separate yml

7800006

fix file name

43057dd

comment vllm for now

a94fbd0

update port in trtllm-serve

0225b10

update artifact name to have runner name at end

1e594f3

update plot function with b200-trt

f63768c

add h200 trt

ed20d23

fix launch slurm script based on runner label

25566a9

better identify if result is vllm or trt

d33cda5

clarify runners for trt and vllm

de2d8de

fix runner names

80dc11d

remove trt runners

3cf357b

ensure trt runners are correctly tagged

9d7cbd3

rename launch scripts

a2ed19c

only get latest run id

fd1ff2e

update trtllm image version

63d11bf

img ids

85a6e51

add fw identifier to benchmark template

6c8af51

limit concurrency for now

9946fb8

fp4 test with trt

c59bcde

update result processing logic

30697f4

kedarpotdar-nv added 12 commits August 29, 2025 00:35

remove fp4 from h200!

8fed2d0

fp4 processing logic

631c007

remove restrictions

9e9df07

merge trt into mainline 70b

b3855ed

add back all runners

374e374

ensure trt results are collected

c79ee8f

update plot function to include trt, remove 70b-trt tmpl. yml, tempor…

ca1baae

…arily disable most configs to sanity test

fix errors!

483ad63

fix vllm launch cmd

4b854c6

rollback vllm changes

2caa5e5

update plotter to ensure b200 , h200 and h100 are correct colors

96f04b6

minor bug fix

71e8088

kedarpotdar-nv requested a review from kimbochen August 29, 2025 21:17

kedarpotdar-nv closed this Sep 4, 2025

kedarpotdar-nv deleted the fp4-init branch September 17, 2025 18:07

functionstackx mentioned this pull request Jan 7, 2026

[Question] GPT-OSS-120B benchmark environment requirements - Driver/CUDA version clarification needed #393

Closed

claude-code-infmax Bot mentioned this pull request Jan 17, 2026

[NVIDIA] fix: update ep metadata in gb200 dynamo sglang configs to match comments #486

Merged

jthomson04 pushed a commit to jthomson04/InferenceMAX that referenced this pull request Jan 21, 2026

Merge pull request SemiAnalysisAI#2 from NVIDIA/add-upstream-sync

712fee7

Update upstream sync for all branches

claude-code-infmax Bot mentioned this pull request Jan 21, 2026

[NV] Update DSR1 GB200 FP4 Disagg Submission #510

Merged

functionstackx mentioned this pull request Jan 23, 2026

[nvidia] update h200 sglang to 0.5.7 #389

Closed

Klaud-Cold mentioned this pull request Jan 23, 2026

[NVIDIA] Update dsr1-fp8-h200-sglang to SGLang v0.5.7 #538

Merged

claude Bot mentioned this pull request Feb 6, 2026

[NV] H100 FP8 Disagg DSR1 1k1k, 8k1k (STP + MTP) #651

Merged

This was referenced Feb 17, 2026

Add Qwen3.5-397B-A17B BF16 B200 SGLang benchmark (STP only) #704

Merged

feat: multinode first-class reorganization #666

Merged

Klaud-Cold mentioned this pull request Apr 5, 2026

collect more shapes #1004

Open

functionstackx mentioned this pull request Apr 7, 2026

[NVIDIA] Add Kimi K2.5 NVFP4 GB200 disaggregated vLLM benchmarks via Dynamo #1008

Merged

cquil11 added the NVIDIA label Apr 8, 2026

cquil11 changed the title ~~Add TRT 70B (FP8 and FP4)~~ [NVIDIA] Add TRT 70B (FP8 and FP4) Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Add TRT 70B (FP8 and FP4)#2

[NVIDIA] Add TRT 70B (FP8 and FP4)#2
kedarpotdar-nv wants to merge 42 commits intomainfrom
fp4-init

kedarpotdar-nv commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kedarpotdar-nv commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants