[FMHA] Support Vectorized KV Cache Layout and vLLM/SGLang block table in Batch Prefill kernel #1754

Jeff-Huang · 2025-12-30T05:36:51Z

Motivation

Introduces support for a vectorized KV cache memory layout (e.g., [num_blocks, num_kv_heads, head_size/8, block_size, 8]) to improve memory access efficiency and also support different type of block table such as vLLM and SGLang.

Technical Details

Key changes:

KV Cache Layout Optimization and Adjustment:
- The KV cache memory layout has been adjusted to support vectorized read patterns (Vectorized KV layout).
- Support for various layout formats has been implemented, such as [num_blocks, num_kv_heads, head_size/8, block_size, 8] and other structures.
vLLM Block Table Integration:
- Added support for vLLM block table integration ([num_batch, max_blocks_per_seq]).
- Added support for SGLang block table integration ([num_blocks]).
- Support PageSize 1024
Kernel Interface Updates:
- New parameters for block table and kv cache layout.
Structure and Traits Updates:
- Adapted to changes in the fmha_fwd_batch_prefill_traits structure.

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…/8, block_size, 8], [num_blocks, num_kv_heads, block_size/8, head_size, 8]

…ayout Updated `mha_batch_prefill` API and tests to support vLLM-style block tables alongside SGLang-style page tables, while enforcing the new hardware-optimized 5D vectorized KV cache layout. **Key Changes:** * **API**: Added `block_table` and `seqlen_k` arguments to python/C++ interfaces. * **Layout Enforcement**: Added strict checks for 5D vectorized KV layout (swizzled x=8) in host bindings and python wrappers. * **CodeGen**: Automatically select `VLLM_BLOCK_TABLE_2D` or `SGLANG_PAGE_TABLE_1D` trait based on input arguments. * **Tests**: Added `test_batch_prefill_vllm` to verify block table correctness and updated existing tests to use the vectorized layout.

poyenc · 2025-12-31T00:11:35Z

aiter/ops/mha.py

-    if head_size_v_og % 8 != 0:
-        v = torch.nn.functional.pad(v, [0, 8 - head_size_v_og % 8])
+    head_size_q_og = q.size(-1)
+    k_vector_size = 16 // k.element_size()


Suggest adding a comment explaining that the magic number 16 corresponds to dwordx4

… one

ltqin and others added 12 commits December 30, 2025 09:09

add page size 16 to test and op

3fa3733

add num_total_pages to kernel parameter

113217e

add is_sglang parameter

6e2d9e4

chang is_sglang to is_sglang_layout

7a463b7

kv last page size=16 pass

ee72e04

pass kv_last_page_lens to kernel

ae459b0

add parameters check before calling kernel

b25cee7

change kv layout to [page_num, page_size, nhead, hdim]

93754f4

adopt the changes of struct fmha_fwd_batch_prefill_traits

8c52122

change kv cache memory layout to [num_blocks, num_kv_heads, head_size…

9d7cd3f

…/8, block_size, 8], [num_blocks, num_kv_heads, block_size/8, head_size, 8]

update CK

ac28e9d

Jeff-Huang requested a review from a team December 30, 2025 05:36

poyenc assigned Jeff-Huang Dec 30, 2025

Jeff-Huang added 3 commits December 30, 2025 18:29

Merge branch 'main' into batch_prefill_page_size_16_rebase

9d69a01

update ck

688b141

adopt api changes from fmha_batch_prefill_traits

0c9c886

poyenc reviewed Dec 31, 2025

View reviewed changes

Jeff-Huang added 12 commits December 31, 2025 11:06

add support for linear kv cache layout

c75fee4

update api

d144a76

Refactor the test code by gathering the different test functions into…

d727a92

… one

Merge branch 'main' into batch_prefill_page_size_16_rebase

7642e79

Merge branch 'main' into batch_prefill_page_size_16_rebase

2917917

update ck

b1f452c

update ck

ed5f66a

Add profile measurements for batch prefill function

f5cc627

Merge branch 'main' into batch_prefill_page_size_16_rebase

c7dd47f

update ck

9e10ffc

fix style

6a06de9

Merge branch 'main' into batch_prefill_page_size_16_rebase

ae12e04

fix style

db5f333

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FMHA] Support Vectorized KV Cache Layout and vLLM/SGLang block table in Batch Prefill kernel #1754

[FMHA] Support Vectorized KV Cache Layout and vLLM/SGLang block table in Batch Prefill kernel #1754

Uh oh!

Jeff-Huang commented Dec 30, 2025

Uh oh!

poyenc Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FMHA] Support Vectorized KV Cache Layout and vLLM/SGLang block table in Batch Prefill kernel #1754

Are you sure you want to change the base?

[FMHA] Support Vectorized KV Cache Layout and vLLM/SGLang block table in Batch Prefill kernel #1754

Uh oh!

Conversation

Jeff-Huang commented Dec 30, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

poyenc Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants