p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang by chunfangamd · Pull Request #1255 · SemiAnalysisAI/InferenceX

chunfangamd · 2026-05-01T15:14:15Z

Upgrade Image to rocm/sgl-dev:rocm720-mi35x-c924543-20260430-DSv4
Enable TileLang Attn/Indexer + CUDA Graph

- bump to c924543 daily image - enable TileLang attn/indexer + cuda graph

github-actions · 2026-05-01T15:14:24Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

chunfangamd · 2026-05-01T15:16:47Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys dsv4-fp8-mi355x-sglang

github-actions · 2026-05-01T15:16:58Z

@chunfangamd Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25219898864
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys dsv4-fp8-mi355x-sglang
Pinned ref: 1730de5
Approval: not required (trusted collaborator).

claude · 2026-05-01T15:19:10Z

+    - "Keep SGLANG_TOPK_TRANSFORM_512_TORCH=1 for now: sgl-project/sglang#24143 (topk512 native ROCm kernel) merged 4-30 21:31 UTC, after the c924543 image was built (4-30 08:26 UTC); will flip to 0 once a newer daily image lands"
+    - "Keep SGLANG_DSV4_FP4_EXPERTS=false and SGLANG_FORCE_TRITON_MOE_FP8=1: required for sgl-project/DeepSeek-V4-Pro-FP8 (FP4 path asserts intermediate_size_per_partition==2048 in fp8.py; swiglu_limit clamp lives in fused_moe_triton)"
+    - "Expected speedup over the previous PR #23608 day-0 torch-fallback recipe: ~5.4-5.8x at conc 1-8 (matches the '+ indexer tilelang attn' tier in the AMD DSv4-Flash-FP8 reference table)"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder


🟡 The new perf-changelog.yaml entry for dsv4-fp8-mi355x-sglang has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder — the literal token Placeholder was never substituted with this PR's number (#1255). That URL 404s and breaks the file's universal convention of using a real numeric PR id. Replace Placeholder with 1255 before merge.

Extended reasoning...

Bug

The new perf-changelog.yaml entry added by this PR ends with:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder

The trailing Placeholder is a literal string — not a numeric PR id. Every other one of the 240+ entries in this file uses a real numeric PR number (the immediately preceding entry, for example, uses /pull/1242). The current PR is #1255, so this should read /pull/1255.

Why this matters

The URL https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder resolves to a 404, so any human (or doc tool) clicking through from the changelog gets a broken link.

The pr-link field is the documented mechanism that ties a config-keys change to the PR that introduced it. Any internal tooling that scrapes pr-link to attribute config-key changes to PRs (release notes, blame-style audit, regression triage) will either fail or attribute this entry to a non-existent PR.

It breaks the file's universal convention — this is the only entry in 2000+ lines of perf-changelog.yaml that does not point to a real PR.

Why nothing caught it

The Pydantic validator at utils/matrix_logic/validation.py declares pr_link as a plain str with no regex/numeric constraint, so a literal Placeholder passes schema validation. The CI is therefore green even though the metadata is wrong.

Fix

One-character change — replace Placeholder with 1255 on the last line of perf-changelog.yaml:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1255

Step-by-step proof

Open perf-changelog.yaml and look at line 2083 (the last line of the diff). It reads: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder.

Take that URL and resolve it: GitHub does not have a PR named Placeholder in this repo, so the page 404s.

Look at the immediately preceding pr-link entry (PR Add GB200 DSV4 Dynamo vLLM MTP2 recipes #1242 entry) and every other pr-link in the file — all use a numeric id. This entry is the lone exception.

The PR description / metadata identifies this PR as p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang #1255, so the intended substitution is unambiguous.

Severity: nit — this is metadata-only, doesn't affect benchmark correctness or execution, but should be fixed before merge to preserve the changelog convention and keep pr-link-scraping tooling working.

functionstackx · 2026-05-01T18:54:14Z

 dsv4-fp8-mi355x-sglang:
-  image: rocm/sgl-dev:deepseek-v4-mi35x
+  image: rocm/sgl-dev:rocm720-mi35x-c924543-20260430-DSv4
  model: sgl-project/DeepSeek-V4-Pro-FP8


@chunfangamd does amd sgl support the fp4 ckpt yet?

[AMD] dsv4-fp8-mi355x-sglang

1730de5

- bump to c924543 daily image - enable TileLang attn/indexer + cuda graph

chunfangamd requested a review from a team May 1, 2026 15:14

chunfangamd requested review from 1am9trash, billishyahao, seungrokj and yctseng0211 as code owners May 1, 2026 15:14

github-project-automation Bot added this to InferenceMAX Board May 1, 2026

Update Perf Changelog

d6dd2f7

claude Bot reviewed May 1, 2026

View reviewed changes

functionstackx changed the title ~~Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang~~ p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang May 1, 2026

Merge branch 'main' into chun/dsv4_pro_fp8

43b0636

functionstackx reviewed May 1, 2026

View reviewed changes

Merge branch 'main' into chun/dsv4_pro_fp8

7174a5d

SemiAnalysisAI deleted a comment from github-actions Bot May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang#1255

p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang#1255
chunfangamd wants to merge 4 commits intomainfrom
chun/dsv4_pro_fp8

chunfangamd commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

chunfangamd commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

claude Bot May 1, 2026

Uh oh!

functionstackx May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chunfangamd commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

chunfangamd commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

claude Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants