Skip to content

Conversation

@cyx-6
Copy link
Contributor

@cyx-6 cyx-6 commented Apr 17, 2023

This PR expands the support for fused stacked attention patterns strating with strided_slice. Initially, we only support fused stacked attention pattern starting with split in #14608. But with the help of #14583, we may have similar patterns starting with strided_slice as well.

This PR expands the support for fused stacked attention patterns strating with `strided_slice`. Initially, we only support fused stacked attention pattern starting with `split` in apache#14608. But with the help of apache#14583, we may have similar patterns starting with `strided_slice` as well.
@tvm-bot
Copy link
Collaborator

tvm-bot commented Apr 17, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

elif start_op == "strided_slice":
query_raw = is_op("relax.strided_slice")(stacked_qkv)
key_raw = is_op("relax.strided_slice")(stacked_qkv)
value_raw = is_op("relax.strided_slice")(stacked_qkv)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will also need to check the begin/end in strides_slice if value has a different sequence length

@masahi
Copy link
Member

masahi commented Apr 17, 2023

But with the help of #14583, we may have similar patterns starting with strided_slice as well.

I found that stride_slice after combined matmul is very expensive. So I'm replacing #branches strided_slice with one split.

k = R.reshape(qkv_tuple[1], [b, s, n, h])
v = R.reshape(qkv_tuple[2], [b, s, n, h_v])
elif op == "strided_slice":
qkv_tuple = R.split(qkv, [n * h, n * h * 2], axis=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qkv_tuple not used (I was confused)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is useless from copy and paste.

@masahi masahi merged commit ccca0f5 into apache:unity Apr 18, 2023
cyx-6 added a commit to cyx-6/tvm that referenced this pull request Apr 18, 2023
This PR is a follow up for apache#14608 and apache#14649. In this PR, we add the checks for the fused stacked attention patterns. So we only enable the fusion of `stacked_qkv` with `ndim=3` and the `split/strided_slice axis=2`.
cyx-6 added a commit that referenced this pull request Apr 19, 2023
* [Unity][BYOC] Add check for stacked attention patterns

This PR is a follow up for #14608 and #14649. In this PR, we add the checks for the fused stacked attention patterns. So we only enable the fusion of `stacked_qkv` with `ndim=3` and the `split/strided_slice axis=2`.

* check the order of strided_slice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants