[Unity][BYOC] Add fused patterns for stacked attention #14608

cyx-6 · 2023-04-12T18:21:02Z

In some models, the input Q, K and V for attention ops are from a stacked tensor initially, and then they are splitted and reshaped to call attention op, like

stacked_qkv -> split -> reshape -> attention.

Actually, we could to skip the split and reshape ops, by manipulating the layout parameters in codegen.

This PR adds the such fused patterns for stacked attention in BYOC. So that we are able to codegen directly from stacked_qkv.

In some models, the input Q, K and V for attention ops are from a stacked tensor initially, and then they are splitted and reshaped to call attention op, like stacked_qkv -> split -> reshape -> attention. Actually, we could to skip the split and reshape ops, by manipulating the layout parameters in codegen. This PR adds the such fused patterns for stacked attention in BYOC. So that we are able to codegen directly from stacked_qkv.

tvm-bot · 2023-04-12T18:21:06Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @billishyahao, @quic-sanirudh _{See #10317 for details}

_{Generated by tvm-bot}

vinx13 · 2023-04-12T21:23:06Z

python/tvm/relax/backend/contrib/cutlass.py

            *make_attention_pattern(with_bias=True),
        ),
+        (
+            "cutlass.stacked_attention",


does the order of patterns here matter? If we have a subgraph containing both reshape and attention, will cutlass.attention that matches only a single attention operation be selected first?

the order matters here. I tried to change the order with stacked attention first, however, the original attention matches first.

* [Unity][BYOC] Add fused patterns for stacked attention In some models, the input Q, K and V for attention ops are from a stacked tensor initially, and then they are splitted and reshaped to call attention op, like stacked_qkv -> split -> reshape -> attention. Actually, we could to skip the split and reshape ops, by manipulating the layout parameters in codegen. This PR adds the such fused patterns for stacked attention in BYOC. So that we are able to codegen directly from stacked_qkv. * fix lint * fix lint

This PR expands the support for fused stacked attention patterns strating with `strided_slice`. Initially, we only support fused stacked attention pattern starting with `split` in apache#14608. But with the help of apache#14583, we may have similar patterns starting with `strided_slice` as well.

* [Unity][BYOC] Fuse attention pattern with `strided_slice` This PR expands the support for fused stacked attention patterns strating with `strided_slice`. Initially, we only support fused stacked attention pattern starting with `split` in #14608. But with the help of #14583, we may have similar patterns starting with `strided_slice` as well. * remove useless code

This PR is a follow up for apache#14608 and apache#14649. In this PR, we add the checks for the fused stacked attention patterns. So we only enable the fusion of `stacked_qkv` with `ndim=3` and the `split/strided_slice axis=2`.

* [Unity][BYOC] Add check for stacked attention patterns This PR is a follow up for #14608 and #14649. In this PR, we add the checks for the fused stacked attention patterns. So we only enable the fusion of `stacked_qkv` with `ndim=3` and the `split/strided_slice axis=2`. * check the order of strided_slice

cyx-6 added 2 commits April 12, 2023 12:05

fix lint

b48e7bf

fix lint

4807dd6

vinx13 approved these changes Apr 12, 2023

View reviewed changes

vinx13 reviewed Apr 12, 2023

View reviewed changes

cyx-6 merged commit 77b35e8 into apache:unity Apr 13, 2023

cyx-6 mentioned this pull request Apr 17, 2023

[Unity][BYOC] Fuse attention pattern with strided_slice #14649

Merged

cyx-6 mentioned this pull request Apr 18, 2023

[Unity][BYOC] Add check for stacked attention patterns #14664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unity][BYOC] Add fused patterns for stacked attention #14608

[Unity][BYOC] Add fused patterns for stacked attention #14608

Uh oh!

cyx-6 commented Apr 12, 2023

Uh oh!

tvm-bot commented Apr 12, 2023

Uh oh!

vinx13 Apr 12, 2023 •

edited

Loading

Uh oh!

cyx-6 Apr 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Unity][BYOC] Add fused patterns for stacked attention #14608

[Unity][BYOC] Add fused patterns for stacked attention #14608

Uh oh!

Conversation

cyx-6 commented Apr 12, 2023

Uh oh!

tvm-bot commented Apr 12, 2023

Uh oh!

vinx13 Apr 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyx-6 Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vinx13 Apr 12, 2023 •

edited

Loading