Skip to content

Bump CK for a stride fix in CKTile Block-Scale GEMM#2862

Draft
samremes wants to merge 1 commit intomainfrom
samremes/ck-bump-eightwaves-fix
Draft

Bump CK for a stride fix in CKTile Block-Scale GEMM#2862
samremes wants to merge 1 commit intomainfrom
samremes/ck-bump-eightwaves-fix

Conversation

@samremes
Copy link
Copy Markdown
Contributor

Motivation

This pull request updates the composable_kernel submodule to a newer commit. This will pull in a bugfix for CKTile Block-Scale GEMM (8-wave pipeline for gfx950).

Technical Details

See explanation in the commit: ROCm/composable_kernel@cbfb3e2

Test Plan

Running existing tests.

Test Result

Block-Scale GEMM tests are still passing locally.

Submission Checklist

@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-355 Run Triton tests on MI355 in addition to MI325
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2862 --add-label <label>

nholmber added a commit to nholmber/aiter that referenced this pull request Apr 22, 2026
PR ROCm#2862's CK bump (cbfb3e242) lacks the ABQuantGrouped/GemmTraits
APIs needed by PRs ROCm#2541 and ROCm#2487. Update to 020b6f435 which has
both the stride fix and the required CK-TILE blockscale APIs.
nholmber added a commit to nholmber/aiter that referenced this pull request Apr 23, 2026
…eline)

The commit IS public in ROCm/composable_kernel. Previous failure was
due to shallow clone not having the commit, not it being missing.
This is the correct CK version that PR ROCm#2862 intended.
nholmber added a commit to nholmber/aiter that referenced this pull request Apr 25, 2026
Tuned 1482 shapes (TP1/TP2/TP4) for Qwen/Qwen3-Next-80B-A3B-Instruct-FP8
on MI355X using CK + CK-TILE backends with splitK support.

Depends on:
- PR ROCm#2862 (CK bump for stride fix in CK-TILE blockscale)
- PR ROCm#2541 (splitK support for CK/CK-TILE blockscale GEMMs)
- PR ROCm#2487 (AQLayout tunable for CK-TILE blockscale 8-warp kernels)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant