prefill gdr kernel enablement by ganyi1996ppo · Pull Request #656 · ROCm/ATOM

ganyi1996ppo · 2026-04-28T09:55:06Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

This PR updates the vLLM Gated Delta Net attention backend to use an optimized GDR (gated delta rule) kernel during the prefill path, likely to improve performance and/or compatibility with newer aiter kernels.

Changes:

Switch the prefill recurrent-attention implementation from the existing self.chunk_gated_delta_rule(...) wrapper to aiter’s chunk_gated_delta_rule_opt_vk(...).
Add an inline import for the new optimized kernel in the prefill path.

Comments suppressed due to low confidence (1)

atom/plugin/vllm/attention_backend/attention_gdn.py:372

After switching the prefill path to call chunk_gated_delta_rule_opt_vk directly, self.chunk_gated_delta_rule (and the ChunkGatedDeltaRule wrapper + fla_chunk_gated_delta_rule import) appear to be unused in this module. Either remove the now-dead wrapper/attribute, or use it as the fallback when the optimized kernel isn’t available to avoid carrying unused code.

            from aiter.ops.triton.gated_delta_net.gated_delta_rule import chunk_gated_delta_rule_opt_vk
            initial_state = ssm_state[non_spec_state_indices_tensor].contiguous()
            initial_state[~has_initial_state, ...] = 0
            (
                core_attn_out_non_spec,
                last_recurrent_state,
            ) = chunk_gated_delta_rule_opt_vk(
                q=query_non_spec,
                k=key_non_spec,
                v=value_non_spec,
                g=g_non_spec,
                beta=beta_non_spec,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T09:59:07Z

+            from aiter.ops.triton.gated_delta_net.gated_delta_rule import chunk_gated_delta_rule_opt_vk
            initial_state = ssm_state[non_spec_state_indices_tensor].contiguous()
            initial_state[~has_initial_state, ...] = 0
            (
                core_attn_out_non_spec,
                last_recurrent_state,
-            ) = self.chunk_gated_delta_rule(
+            ) = chunk_gated_delta_rule_opt_vk(


The new inline import of chunk_gated_delta_rule_opt_vk will raise ImportError at runtime on prefill if the installed aiter version doesn’t provide this symbol (note this file already treats aiter as optional via the guarded flydsl_gdr_decode import). Consider doing a module-level try/except import with a clear fallback to the existing fla_chunk_gated_delta_rule implementation (or a feature flag) so prefill doesn’t hard-crash when the optimized kernel is unavailable.

Signed-off-by: ganyi <ygan@amd.com>

prefill gdr kernel enablement

84f819f

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings April 28, 2026 09:55

Copilot started reviewing on behalf of ganyi1996ppo April 28, 2026 09:56 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

format

cebbb6e

Signed-off-by: ganyi <ygan@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prefill gdr kernel enablement#656

prefill gdr kernel enablement#656
ganyi1996ppo wants to merge 2 commits intomainfrom
ganyi/qwen3next_prefill

ganyi1996ppo commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ganyi1996ppo commented Apr 28, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants