[NVIDIA] sglang: add fp8 8k1k and fp4 1k1k by ishandhanani · Pull Request #274 · SemiAnalysisAI/InferenceX

ishandhanani · 2025-12-04T00:51:24Z

we now use sglang 0.5.5.post2 across the board

This reverts commit efcb4e4.

cquil11 · 2025-12-04T16:29:39Z

@ishandhanani I think I merged prematurely. can you pls open another PR on ishandhanani:ishan/morecfgs and address some of the issues mentioned below:

please double check branches you are checking out

+ git clone --branch ishan/sa-1.1-sgl-dsr1 https://github.com/ai-dynamo/dynamo.git
Cloning Dynamo repository...
Cloning into 'dynamo'...
fatal: Remote branch ishan/sa-1.1-sgl-dsr1 not found in upstream origin

these TODOs https://github.com/InferenceMAX/InferenceMAX/pull/274/files#diff-1bd56add5a769a3c46eaffc11bbbccc4d8546cf7c1a0877b735dc46581ac8405R895:~:text=dynamo%2Dsglang%3A-,%23%20TODO%3A%20swap,-image%3A%20nvcr

ishandhanani · 2025-12-04T21:04:43Z

PR isnt ready yet. That dynamo branch will be created shortly.

* adding initial changes to master configs; adding initial updates to validation logic and config parser * adding new gb200 script * adding integration to gb200 runner script and workflow files * revert and correct name of 1k1k scheduler workflow * adding runners.yaml to workflow invocation * toJson on conc since it is now a list * correctly sending conc list to multnode * hotfix * correct env var to MAX batch size * set -x * debugging with dynmao fork * debugging with dynmao fork pt 2 * experiment * adding separate script for launching * changing filenames * ntasks per node * making the spec-decoding output required * updating ntasks per node gp * test * test * conc list quoted * get rid of debug code * testing support for dsr1 * testing support for dsr1 test * testing support for dsr1 test * testing support for dsr1 test * testing support for dsr1 test * testing * some changes to generate sweeps * testing and debugging * adding new file code for sglang * adding new file code for sglang * changing file path * updating multinode fn hash * updating multinode fn hash * dynamo trtllm to dynamo trt * changing process result * add is multinode * bug fix * bug fix * bug fix * bug fix * polishing * polishing pt 2 * polishing pt 3 * polishing pt 4 * fixing summarize.py * polishing * testing * testing * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding tests * adding tests * adding tests * adding tests * adding tests * adding tests * adding tests * add updates for newest gb200 merge * add updates for newest gb200 merge pt 2 * move ntasks per node to framework level instead of runner level * nexp hard coded to 1: * add AMD configs to full sweep * shut the line counter workflow up haha * shut the line counter workflow up haha * shut the line counter workflow up haha pt 2 * updating testing logic * add model prefix to label validator * add more descriptive name to tests * update test for process results * add script mode * fix bug * sglang: add fp8 8k1k and fp4 1k1k (#274) * go * typo * typo... * more * Revert "sglang: add fp8 8k1k and fp4 1k1k (#274)" (#283) This reverts commit efcb4e4. * get rid of ntasks per node required env var for sglang * bug fix * bug fix missing amd * bug fix missing amd pt 2 * add served model name to summary * add served model name to summary pt 2 * add served model name to summary pt 3 * fix max model len bug * add readme * add image to json result --------- Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>

ishandhanani added 4 commits December 3, 2025 16:39

go

44868fc

typo

0e3c359

typo...

297bd7f

more

2cc19a3

ishandhanani requested a review from a team as a code owner December 4, 2025 00:51

cquil11 approved these changes Dec 4, 2025

View reviewed changes

cquil11 merged commit efcb4e4 into SemiAnalysisAI:multinode-integration Dec 4, 2025

cquil11 added a commit that referenced this pull request Dec 4, 2025

Revert "sglang: add fp8 8k1k and fp4 1k1k (#274)"

255fe34

This reverts commit efcb4e4.

cquil11 mentioned this pull request Dec 4, 2025

[NVIDIA] Revert "sglang: add fp8 8k1k and fp4 1k1k" #283

Merged

cquil11 added a commit that referenced this pull request Dec 4, 2025

Revert "sglang: add fp8 8k1k and fp4 1k1k (#274)" (#283)

21ec133

This reverts commit efcb4e4.

functionstackx added this to InferenceMAX Board Dec 7, 2025

github-project-automation Bot moved this to Done in InferenceMAX Board Dec 7, 2025

functionstackx added the NVIDIA label Dec 7, 2025

functionstackx temporarily deployed to fork-pr-validation December 7, 2025 21:36 — with GitHub Actions Inactive

cquil11 changed the title ~~sglang: add fp8 8k1k and fp4 1k1k~~ [NVIDIA] sglang: add fp8 8k1k and fp4 1k1k Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] sglang: add fp8 8k1k and fp4 1k1k#274

[NVIDIA] sglang: add fp8 8k1k and fp4 1k1k#274
cquil11 merged 4 commits intoSemiAnalysisAI:multinode-integrationfrom
ishandhanani:ishan/morecfgs

ishandhanani commented Dec 4, 2025

Uh oh!

cquil11 commented Dec 4, 2025

Uh oh!

ishandhanani commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ishandhanani commented Dec 4, 2025

Uh oh!

cquil11 commented Dec 4, 2025

Uh oh!

ishandhanani commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants