Skip to content

support sparse attn mtp#649

Open
jiayyu wants to merge 13 commits intomainfrom
sparse_mtp
Open

support sparse attn mtp#649
jiayyu wants to merge 13 commits intomainfrom
sparse_mtp

Conversation

@jiayyu
Copy link
Copy Markdown
Contributor

@jiayyu jiayyu commented Apr 25, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings April 25, 2026 13:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the ROCm AITER MLA sparse-attention path to support MTP (multi-token) decode/verify by introducing per-token sparse metadata handling and a new Triton kernel to gather sparse KV page indices for the per-token layout.

Changes:

  • Adjust sparse indexer/decode token accounting to handle max_seqlen_q > 1 in MTP verify.
  • Add a second set of persistent MLA metadata buffers for sparse MTP (per-token layout) and wire them into decode + cudagraph capture paths.
  • Add a Triton kernel to gather sparse KV indices for MTP per-token layout; update MLA decode path to select correct metadata/indices.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
atom/models/deepseek_v2.py Fix decode token counting for sparse indexer under MTP verify (batch_size * max_seqlen_q).
atom/model_ops/attentions/aiter_mla.py Introduce sparse-MTP persistent buffers + per-token metadata path; update decode/cudagraph capture plumbing for sparse MTP.
atom/model_ops/attention_mla.py Route sparse MTP decode through per-token metadata and new Triton gather kernel for KV indices.
atom/model_engine/model_runner.py Minor KV-cache sizing/cudagraph init tweaks related to total layers + sparse indptr reset.
.github/benchmark/models_accuracy.json Formatting-only change (newline/EOF).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/model_ops/attentions/aiter_mla.py
Copilot AI review requested due to automatic review settings April 28, 2026 03:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/model_ops/attentions/aiter_mla.py
Comment thread atom/model_ops/attentions/aiter_mla.py Outdated
Copilot AI review requested due to automatic review settings April 28, 2026 05:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/model_ops/attentions/aiter_mla.py
Comment thread atom/model_ops/attentions/aiter_mla.py Outdated
Copilot AI review requested due to automatic review settings April 28, 2026 12:25
@valarLip valarLip requested a review from junhaha666 April 30, 2026 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants