Skip to content

Conversation

@cyx-6
Copy link
Contributor

@cyx-6 cyx-6 commented Apr 12, 2023

In some models, the input Q, K and V for attention ops are from a stacked tensor initially, and then they are splitted and reshaped to call attention op, like

stacked_qkv -> split -> reshape -> attention.

Actually, we could to skip the split and reshape ops, by manipulating the layout parameters in codegen.

This PR adds the such fused patterns for stacked attention in BYOC. So that we are able to codegen directly from stacked_qkv.

In some models, the input Q, K and V for attention ops are from a
stacked tensor initially, and then they are splitted and reshaped to call
attention op, like

stacked_qkv -> split -> reshape -> attention.

Actually, we could to skip the split and reshape ops,
by manipulating the layout parameters in codegen.

This PR adds the such fused patterns for stacked attention in BYOC.
So that we are able to codegen directly from stacked_qkv.
@tvm-bot
Copy link
Collaborator

tvm-bot commented Apr 12, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

*make_attention_pattern(with_bias=True),
),
(
"cutlass.stacked_attention",
Copy link
Member

@vinx13 vinx13 Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the order of patterns here matter? If we have a subgraph containing both reshape and attention, will cutlass.attention that matches only a single attention operation be selected first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order matters here. I tried to change the order with stacked attention first, however, the original attention matches first.

@cyx-6 cyx-6 merged commit 77b35e8 into apache:unity Apr 13, 2023
tqchen pushed a commit that referenced this pull request Apr 13, 2023
* [Unity][BYOC] Add fused patterns for stacked attention

In some models, the input Q, K and V for attention ops are from a
stacked tensor initially, and then they are splitted and reshaped to call
attention op, like

stacked_qkv -> split -> reshape -> attention.

Actually, we could to skip the split and reshape ops,
by manipulating the layout parameters in codegen.

This PR adds the such fused patterns for stacked attention in BYOC.
So that we are able to codegen directly from stacked_qkv.

* fix lint

* fix lint
cyx-6 added a commit to cyx-6/tvm that referenced this pull request Apr 17, 2023
This PR expands the support for fused stacked attention patterns strating with `strided_slice`. Initially, we only support fused stacked attention pattern starting with `split` in apache#14608. But with the help of apache#14583, we may have similar patterns starting with `strided_slice` as well.
masahi pushed a commit that referenced this pull request Apr 18, 2023
* [Unity][BYOC] Fuse attention pattern with `strided_slice`

This PR expands the support for fused stacked attention patterns strating with `strided_slice`. Initially, we only support fused stacked attention pattern starting with `split` in #14608. But with the help of #14583, we may have similar patterns starting with `strided_slice` as well.

* remove useless code
cyx-6 added a commit to cyx-6/tvm that referenced this pull request Apr 18, 2023
This PR is a follow up for apache#14608 and apache#14649. In this PR, we add the checks for the fused stacked attention patterns. So we only enable the fusion of `stacked_qkv` with `ndim=3` and the `split/strided_slice axis=2`.
cyx-6 added a commit that referenced this pull request Apr 19, 2023
* [Unity][BYOC] Add check for stacked attention patterns

This PR is a follow up for #14608 and #14649. In this PR, we add the checks for the fused stacked attention patterns. So we only enable the fusion of `stacked_qkv` with `ndim=3` and the `split/strided_slice axis=2`.

* check the order of strided_slice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants