[AMD/Hyperloom] Tune dsr1-fp8-mi355x-sglang: --num-continuous-decode-steps 4 → 8 by lishuoshuo-amd · Pull Request #1243 · SemiAnalysisAI/InferenceX

lishuoshuo-amd · 2026-05-01T00:08:11Z

Description

Tune --num-continuous-decode-steps from 4 to 8 for DeepSeek-R1-0528 FP8 on MI355X (SGLang).
Increasing continuous decode steps reduces prefill/decode scheduling overhead, lowering per-token latency (TPOT) and improving overall throughput.

Changes

benchmarks/single_node/dsr1_fp8_mi355x.sh: --num-continuous-decode-steps 4 → 8
perf-changelog.yaml: Added changelog entry

Performance Results

Hyperloom CI Optimization Report (conc=64, 1k/1k)

Metric	Baseline	Optimized	Change
Output Throughput (per GPU)	311.65 tok/s	331.22 tok/s	+6.28%
TPOT	24.46 ms	23.01 ms	-5.93%
TTFT	581.20 ms	528.52 ms	-9.07%
vs InferenceX Official	+0.50%	+6.81%

Full Parameter Sweep (12 points, 0 failures)

Verified across the complete (tp, conc, isl, osl) search-space from amd-master.yaml:

ISL/OSL	TP	Conc	Baseline (tok/s)	Optimized (tok/s)	Gain
1k/1k	8	4	399.90	417.60	+4.43%
1k/1k	8	8	729.10	750.26	+2.90%
1k/1k	8	16	1140.92	1173.48	+2.85%
1k/1k	8	32	1683.81	1739.49	+3.31%
1k/1k	8	64	2614.64	2654.50	+1.52%
8k/1k	4	32	770.98	821.53	+6.56%
8k/1k	4	64	991.61	1031.77	+4.05%
8k/1k	8	4	310.30	366.00	+17.95%
8k/1k	8	8	625.61	636.92	+1.81%
8k/1k	8	16	902.98	925.49	+2.49%
8k/1k	8	32	1213.94	1279.19	+5.38%
8k/1k	8	64	1664.40	1709.37	+2.70%

Average gain: +4.7% — positive improvement across all parameter combinations with no regression.

Baseline Validation Against InferenceX Official

Conc	Official (tok/s/GPU)	Our Baseline (tok/s/GPU)	Diff
4	49.82	49.99	+0.3%

Baseline aligns within <1% of official InferenceX data, confirming test environment reliability.

Note: All throughput numbers in this PR refer to output (decode) token throughput, never total. The "Optimization Report" and "Baseline Validation" tables show per-GPU values; the "Full Parameter Sweep" table shows aggregate (TP-summed) values from raw SGLang output_throughput. Per-GPU = aggregate / TP. Gain percentages are unit-invariant.

Related Issue

Automated optimization by Hyperloom CI.

Type of Change

Configuration change

Checklist

I have tested my changes locally
I have updated documentation if necessary
If I changed a container image or config, I have already updated perf-changelog.yaml

…steps 4 → 8

github-actions · 2026-05-01T00:08:19Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

…tune-dsr1

Made-with: Cursor

…tune-dsr1 Made-with: Cursor # Conflicts: # perf-changelog.yaml

github-actions · 2026-05-01T05:58:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25204331730
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25204331730

chunfangamd

lgtm

github-actions · 2026-05-01T07:06:12Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25204505898
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25204505898

lishuoshuo-amd added 2 commits April 27, 2026 20:11

[AMD/Hyperloom] Tune dsr1-fp8-mi355x-sglang: --num-continuous-decode-…

e7b0fc3

…steps 4 → 8

fix: update dsr1_fp8_mi355x

b10c872

lishuoshuo-amd requested a review from a team May 1, 2026 00:08

github-project-automation Bot added this to InferenceMAX Board May 1, 2026

claude Bot reviewed May 1, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

lishuoshuo-amd added 2 commits May 1, 2026 08:12

fix: update changelog PR link

81f861f

Merge remote-tracking branch 'origin/main' into amd/hyperloom/mi355x-…

3f7c353

…tune-dsr1

lishuoshuo-amd mentioned this pull request May 1, 2026

[AMD/Hyperloom] Tune dsr1-fp8-mi355x-sglang: --num-continuous-decode-steps 4 → 8 #1109

Closed

4 tasks

lishuoshuo-amd added the sweep-enabled label May 1, 2026

fix: append changelog entry

f13cd18

lishuoshuo-amd added sweep-enabled and removed sweep-enabled labels May 1, 2026

lishuoshuo-amd added 2 commits May 1, 2026 08:57

chore: trigger sweep

bf3d59f

Made-with: Cursor

Merge remote-tracking branch 'origin/main' into amd/hyperloom/mi355x-…

42330f5

…tune-dsr1 Made-with: Cursor # Conflicts: # perf-changelog.yaml

chunfangamd approved these changes May 1, 2026

View reviewed changes

Merge branch 'main' into amd/hyperloom/mi355x-tune-dsr1

c709a29

chunfangamd force-pushed the amd/hyperloom/mi355x-tune-dsr1 branch from 00859b5 to c709a29 Compare May 1, 2026 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD/Hyperloom] Tune dsr1-fp8-mi355x-sglang: --num-continuous-decode-steps 4 → 8#1243

[AMD/Hyperloom] Tune dsr1-fp8-mi355x-sglang: --num-continuous-decode-steps 4 → 8#1243
lishuoshuo-amd wants to merge 8 commits intomainfrom
amd/hyperloom/mi355x-tune-dsr1

lishuoshuo-amd commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

chunfangamd left a comment

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lishuoshuo-amd commented May 1, 2026

Description

Changes

Performance Results

Hyperloom CI Optimization Report (conc=64, 1k/1k)

Full Parameter Sweep (12 points, 0 failures)

Baseline Validation Against InferenceX Official

Related Issue

Type of Change

Checklist

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

chunfangamd left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants