p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang#1255
p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang#1255chunfangamd wants to merge 4 commits intomainfrom
Conversation
- bump to c924543 daily image - enable TileLang attn/indexer + cuda graph
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys dsv4-fp8-mi355x-sglang |
|
@chunfangamd Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25219898864 |
| - "Keep SGLANG_TOPK_TRANSFORM_512_TORCH=1 for now: sgl-project/sglang#24143 (topk512 native ROCm kernel) merged 4-30 21:31 UTC, after the c924543 image was built (4-30 08:26 UTC); will flip to 0 once a newer daily image lands" | ||
| - "Keep SGLANG_DSV4_FP4_EXPERTS=false and SGLANG_FORCE_TRITON_MOE_FP8=1: required for sgl-project/DeepSeek-V4-Pro-FP8 (FP4 path asserts intermediate_size_per_partition==2048 in fp8.py; swiglu_limit clamp lives in fused_moe_triton)" | ||
| - "Expected speedup over the previous PR #23608 day-0 torch-fallback recipe: ~5.4-5.8x at conc 1-8 (matches the '+ indexer tilelang attn' tier in the AMD DSv4-Flash-FP8 reference table)" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry for dsv4-fp8-mi355x-sglang has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder — the literal token Placeholder was never substituted with this PR's number (#1255). That URL 404s and breaks the file's universal convention of using a real numeric PR id. Replace Placeholder with 1255 before merge.
Extended reasoning...
Bug
The new perf-changelog.yaml entry added by this PR ends with:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PlaceholderThe trailing Placeholder is a literal string — not a numeric PR id. Every other one of the 240+ entries in this file uses a real numeric PR number (the immediately preceding entry, for example, uses /pull/1242). The current PR is #1255, so this should read /pull/1255.
Why this matters
- The URL
https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholderresolves to a 404, so any human (or doc tool) clicking through from the changelog gets a broken link. - The
pr-linkfield is the documented mechanism that ties a config-keys change to the PR that introduced it. Any internal tooling that scrapespr-linkto attribute config-key changes to PRs (release notes, blame-style audit, regression triage) will either fail or attribute this entry to a non-existent PR. - It breaks the file's universal convention — this is the only entry in 2000+ lines of perf-changelog.yaml that does not point to a real PR.
Why nothing caught it
The Pydantic validator at utils/matrix_logic/validation.py declares pr_link as a plain str with no regex/numeric constraint, so a literal Placeholder passes schema validation. The CI is therefore green even though the metadata is wrong.
Fix
One-character change — replace Placeholder with 1255 on the last line of perf-changelog.yaml:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1255Step-by-step proof
- Open
perf-changelog.yamland look at line 2083 (the last line of the diff). It reads:pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder. - Take that URL and resolve it: GitHub does not have a PR named
Placeholderin this repo, so the page 404s. - Look at the immediately preceding
pr-linkentry (PR Add GB200 DSV4 Dynamo vLLM MTP2 recipes #1242 entry) and every otherpr-linkin the file — all use a numeric id. This entry is the lone exception. - The PR description / metadata identifies this PR as p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang #1255, so the intended substitution is unambiguous.
Severity: nit — this is metadata-only, doesn't affect benchmark correctness or execution, but should be fixed before merge to preserve the changelog convention and keep pr-link-scraping tooling working.
| dsv4-fp8-mi355x-sglang: | ||
| image: rocm/sgl-dev:deepseek-v4-mi35x | ||
| image: rocm/sgl-dev:rocm720-mi35x-c924543-20260430-DSv4 | ||
| model: sgl-project/DeepSeek-V4-Pro-FP8 |
There was a problem hiding this comment.
@chunfangamd does amd sgl support the fp4 ckpt yet?
rocm/sgl-dev:rocm720-mi35x-c924543-20260430-DSv4