Skip to content

p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang#1255

Open
chunfangamd wants to merge 4 commits intomainfrom
chun/dsv4_pro_fp8
Open

p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang#1255
chunfangamd wants to merge 4 commits intomainfrom
chun/dsv4_pro_fp8

Conversation

@chunfangamd
Copy link
Copy Markdown
Collaborator

  • Upgrade Image to rocm/sgl-dev:rocm720-mi35x-c924543-20260430-DSv4
  • Enable TileLang Attn/Indexer + CUDA Graph

- bump to c924543 daily image
- enable TileLang attn/indexer + cuda graph
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@chunfangamd
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys dsv4-fp8-mi355x-sglang

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

@chunfangamd Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25219898864
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys dsv4-fp8-mi355x-sglang
Pinned ref: 1730de5
Approval: not required (trusted collaborator).

Comment thread perf-changelog.yaml Outdated
- "Keep SGLANG_TOPK_TRANSFORM_512_TORCH=1 for now: sgl-project/sglang#24143 (topk512 native ROCm kernel) merged 4-30 21:31 UTC, after the c924543 image was built (4-30 08:26 UTC); will flip to 0 once a newer daily image lands"
- "Keep SGLANG_DSV4_FP4_EXPERTS=false and SGLANG_FORCE_TRITON_MOE_FP8=1: required for sgl-project/DeepSeek-V4-Pro-FP8 (FP4 path asserts intermediate_size_per_partition==2048 in fp8.py; swiglu_limit clamp lives in fused_moe_triton)"
- "Expected speedup over the previous PR #23608 day-0 torch-fallback recipe: ~5.4-5.8x at conc 1-8 (matches the '+ indexer tilelang attn' tier in the AMD DSv4-Flash-FP8 reference table)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry for dsv4-fp8-mi355x-sglang has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder — the literal token Placeholder was never substituted with this PR's number (#1255). That URL 404s and breaks the file's universal convention of using a real numeric PR id. Replace Placeholder with 1255 before merge.

Extended reasoning...

Bug

The new perf-changelog.yaml entry added by this PR ends with:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder

The trailing Placeholder is a literal string — not a numeric PR id. Every other one of the 240+ entries in this file uses a real numeric PR number (the immediately preceding entry, for example, uses /pull/1242). The current PR is #1255, so this should read /pull/1255.

Why this matters

  • The URL https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder resolves to a 404, so any human (or doc tool) clicking through from the changelog gets a broken link.
  • The pr-link field is the documented mechanism that ties a config-keys change to the PR that introduced it. Any internal tooling that scrapes pr-link to attribute config-key changes to PRs (release notes, blame-style audit, regression triage) will either fail or attribute this entry to a non-existent PR.
  • It breaks the file's universal convention — this is the only entry in 2000+ lines of perf-changelog.yaml that does not point to a real PR.

Why nothing caught it

The Pydantic validator at utils/matrix_logic/validation.py declares pr_link as a plain str with no regex/numeric constraint, so a literal Placeholder passes schema validation. The CI is therefore green even though the metadata is wrong.

Fix

One-character change — replace Placeholder with 1255 on the last line of perf-changelog.yaml:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1255

Step-by-step proof

  1. Open perf-changelog.yaml and look at line 2083 (the last line of the diff). It reads: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder.
  2. Take that URL and resolve it: GitHub does not have a PR named Placeholder in this repo, so the page 404s.
  3. Look at the immediately preceding pr-link entry (PR Add GB200 DSV4 Dynamo vLLM MTP2 recipes #1242 entry) and every other pr-link in the file — all use a numeric id. This entry is the lone exception.
  4. The PR description / metadata identifies this PR as p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang #1255, so the intended substitution is unambiguous.

Severity: nit — this is metadata-only, doesn't affect benchmark correctness or execution, but should be fixed before merge to preserve the changelog convention and keep pr-link-scraping tooling working.

@functionstackx functionstackx changed the title Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang May 1, 2026
dsv4-fp8-mi355x-sglang:
image: rocm/sgl-dev:deepseek-v4-mi35x
image: rocm/sgl-dev:rocm720-mi35x-c924543-20260430-DSv4
model: sgl-project/DeepSeek-V4-Pro-FP8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunfangamd does amd sgl support the fp4 ckpt yet?

@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants