Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the ROCm AITER MLA sparse-attention path to support MTP (multi-token) decode/verify by introducing per-token sparse metadata handling and a new Triton kernel to gather sparse KV page indices for the per-token layout.
Changes:
- Adjust sparse indexer/decode token accounting to handle
max_seqlen_q > 1in MTP verify. - Add a second set of persistent MLA metadata buffers for sparse MTP (per-token layout) and wire them into decode + cudagraph capture paths.
- Add a Triton kernel to gather sparse KV indices for MTP per-token layout; update MLA decode path to select correct metadata/indices.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
atom/models/deepseek_v2.py |
Fix decode token counting for sparse indexer under MTP verify (batch_size * max_seqlen_q). |
atom/model_ops/attentions/aiter_mla.py |
Introduce sparse-MTP persistent buffers + per-token metadata path; update decode/cudagraph capture plumbing for sparse MTP. |
atom/model_ops/attention_mla.py |
Route sparse MTP decode through per-token metadata and new Triton gather kernel for KV indices. |
atom/model_engine/model_runner.py |
Minor KV-cache sizing/cudagraph init tweaks related to total layers + sparse indptr reset. |
.github/benchmark/models_accuracy.json |
Formatting-only change (newline/EOF). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist