Skip to content

[NVIDIA] sglang: add fp8 8k1k and fp4 1k1k#274

Merged
cquil11 merged 4 commits intoSemiAnalysisAI:multinode-integrationfrom
ishandhanani:ishan/morecfgs
Dec 4, 2025
Merged

[NVIDIA] sglang: add fp8 8k1k and fp4 1k1k#274
cquil11 merged 4 commits intoSemiAnalysisAI:multinode-integrationfrom
ishandhanani:ishan/morecfgs

Conversation

@ishandhanani
Copy link
Copy Markdown
Collaborator

we now use sglang 0.5.5.post2 across the board

@ishandhanani ishandhanani requested a review from a team as a code owner December 4, 2025 00:51
@cquil11 cquil11 merged commit efcb4e4 into SemiAnalysisAI:multinode-integration Dec 4, 2025
cquil11 added a commit that referenced this pull request Dec 4, 2025
cquil11 added a commit that referenced this pull request Dec 4, 2025
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 4, 2025

@ishandhanani I think I merged prematurely. can you pls open another PR on ishandhanani:ishan/morecfgs and address some of the issues mentioned below:

  • please double check branches you are checking out
+ git clone --branch ishan/sa-1.1-sgl-dsr1 https://github.com/ai-dynamo/dynamo.git
Cloning Dynamo repository...
Cloning into 'dynamo'...
fatal: Remote branch ishan/sa-1.1-sgl-dsr1 not found in upstream origin

@ishandhanani
Copy link
Copy Markdown
Collaborator Author

PR isnt ready yet. That dynamo branch will be created shortly.

cquil11 added a commit that referenced this pull request Dec 5, 2025
* adding initial changes to master configs; adding initial updates to validation logic and config parser

* adding new gb200 script

* adding integration to gb200 runner script and workflow files

* revert and correct name of 1k1k scheduler workflow

* adding runners.yaml to workflow invocation

* toJson on conc since it is now a list

* correctly sending conc list to multnode

* hotfix

* correct env var to MAX batch size

* set -x

* debugging with dynmao fork

* debugging with dynmao fork pt 2

* experiment

* adding separate script for launching

* changing filenames

* ntasks per node

* making the spec-decoding output required

* updating ntasks per node
gp

* test

* test

* conc list quoted

* get rid of debug code

* testing support for dsr1

* testing support for dsr1 test

* testing support for dsr1 test

* testing support for dsr1 test

* testing support for dsr1 test

* testing

* some changes to generate sweeps

* testing and debugging

* adding new file code for sglang

* adding new file code for sglang

* changing file path

* updating multinode fn hash

* updating multinode fn hash

* dynamo trtllm to dynamo trt

* changing process result

* add is multinode

* bug fix

* bug fix

* bug fix

* bug fix

* polishing

* polishing pt 2

* polishing pt 3

* polishing pt 4

* fixing summarize.py

* polishing

* testing

* testing

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding testing workflows

* adding tests

* adding tests

* adding tests

* adding tests

* adding tests

* adding tests

* adding tests

* add updates for newest gb200 merge

* add updates for newest gb200 merge pt 2

* move ntasks per node to framework level instead of runner level

* nexp hard coded to 1:

* add AMD configs to full sweep

* shut the line counter workflow up haha

* shut the line counter workflow up haha

* shut the line counter workflow up haha pt 2

* updating testing logic

* add model prefix to label validator

* add more descriptive name to tests

* update test for process results

* add script mode

* fix bug

* sglang: add fp8 8k1k and fp4 1k1k (#274)

* go

* typo

* typo...

* more

* Revert "sglang: add fp8 8k1k and fp4 1k1k (#274)" (#283)

This reverts commit efcb4e4.

* get rid of ntasks per node required env var for sglang

* bug fix

* bug fix missing amd

* bug fix missing amd pt 2

* add served model name to summary

* add served model name to summary pt 2

* add served model name to summary pt 3

* fix max model len bug

* add readme

* add image to json result

---------

Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
@cquil11 cquil11 changed the title sglang: add fp8 8k1k and fp4 1k1k [NVIDIA] sglang: add fp8 8k1k and fp4 1k1k Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

3 participants