dsa: remove block_table_convert_triton in dsa by junhaha666 · Pull Request #658 · ROCm/ATOM

junhaha666 · 2026-04-28T10:23:47Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR updates DeepSeek V3.2 sparse-MLA (“DSA”) indexing to stop relying on block_table_convert_triton and instead compute global KV indices from raw block tables using an explicit page/block size.

Changes:

Update the DSA prefill Triton index-conversion kernel to use PAGE_SIZE (KV cache block size) when mapping token indices to physical KV slots.
Adjust DeepSeek V2 indexer KV-cache handling to reshape by kv_cache_block_size and enable preshuffle/Preshuffle options in relevant AITer ops.
Disable quantization for weights_proj in the indexer path.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`atom/models/deepseek_v2.py`	Reshapes indexer KV cache by configured block size; enables preshuffle flags; disables quantization for `weights_proj`.
`atom/model_ops/attentions/aiter_mla.py`	Comments out block-table conversion usage and stops passing `block_tables_converted` for cudagraph capture metadata.
`atom/model_ops/attention_mla.py`	Updates DSA prefill req-local → global KV index conversion to use `PAGE_SIZE`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T10:33:36Z

+                # if self.block_ratio > 1:
+                #     block_table_convert_triton(
+                #         var["block_tables"].gpu[:bs],
+                #         var["block_tables_converted"].gpu[:bs],
+                #         var["context_lens"].gpu[:bs],
+                #         self.block_ratio,
+                #     )
+                #     attn_metadata.block_tables = var["block_tables_converted"].gpu[:bs]


The block_table_convert_triton(...) logic is now commented out, but the module still imports block_table_convert_triton, and the commented block will drift out of sync over time. If the intent is to remove this conversion path, please delete the dead code and drop the unused import; also consider removing/guarding any remaining block_tables_converted plumbing so it can’t accidentally override block_tables in AttentionMetaData.__init__.

Copilot · 2026-04-28T10:33:36Z

+            # block_tables_converted=(
+            #     var["block_tables_converted"].gpu[:bs]
+            #     if "block_tables_converted" in var
+            #     else None
+            # ),


build_for_cudagraph_capture() no longer passes block_tables_converted, but other paths in this file still pass it when present (e.g., build_ubatch_metadata()), and AttentionMetaData.__init__ will override block_tables when block_tables_converted is not None. Since the conversion kernels are now disabled/commented out, block_tables_converted can be stale/uninitialized and silently break sparse attention under TBO/CUDAGraph. Please make the handling consistent: either always populate block_tables_converted before passing it, or stop passing/allocating it everywhere for MLA.

Suggested change

# block_tables_converted=(

# var["block_tables_converted"].gpu[:bs]

# if "block_tables_converted" in var

# else None

# ),

# MLA block-table conversion is currently disabled. Always pass

# None here so AttentionMetaData keeps using block_tables instead

# of accidentally overriding it with stale/uninitialized converted

# data.

block_tables_converted=None,

Copilot · 2026-04-28T10:33:36Z

            hidden_size,
            self.n_head,
-            quant_config=quant_config,
+            quant_config=None,


Changing weights_proj from quant_config=quant_config to quant_config=None changes quantization behavior for the indexer path while the surrounding projections (wq_b, wk) still honor quant_config. If this layer must remain unquantized for correctness/accuracy (similar to the MoE gate), add an explicit comment explaining why; otherwise this likely regresses performance or makes quantization behavior inconsistent for this module.

Suggested change

quant_config=None,

quant_config=quant_config,

Copilot · 2026-04-28T10:33:37Z

            if attn_metadata.block_tables is None:
                self.prepare_block_tables(batch)
                attn_metadata.block_tables = var["block_tables"].copy_to_gpu(bs)
-                if self.block_ratio > 1:
-                    block_table_convert_triton(
-                        var["block_tables"].gpu[:bs],
-                        var["block_tables_converted"].gpu[:bs],
-                        var["context_lens"].gpu[:bs],
-                        self.block_ratio,
-                    )
-                    attn_metadata.block_tables = var["block_tables_converted"].gpu[:bs]
+                # if self.block_ratio > 1:
+                #     block_table_convert_triton(


In prepare_prefill, attn_metadata comes from CommonAttentionBuilder.prepare_prefill(), which will populate block_tables_converted when has_cached and block_ratio > 1, and AttentionMetaData.__init__ replaces self.block_tables with the converted table. With the updated DSA index conversion (now using PAGE_SIZE and raw physical block ids), this if attn_metadata.block_tables is None: gate means cached-prefill requests will continue using a converted per-token table, which can produce incorrect KV indices. Consider always resetting attn_metadata.block_tables to the raw var["block_tables"] for the sparse/DSA path (or disabling the common conversion for MLA builders) so the DSA kernel consistently sees physical block ids.

dsa: remove block_table_convert_triton in dsa

9b5e76d

Copilot AI review requested due to automatic review settings April 28, 2026 10:23

junhaha666 marked this pull request as draft April 28, 2026 10:23

Copilot started reviewing on behalf of junhaha666 April 28, 2026 10:25 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dsa: remove block_table_convert_triton in dsa#658

dsa: remove block_table_convert_triton in dsa#658
junhaha666 wants to merge 1 commit intomainfrom
jun/dsa_fix

junhaha666 commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

junhaha666 commented Apr 28, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants