-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[Unity][BYOC] Fuse attention pattern with strided_slice
#14649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR expands the support for fused stacked attention patterns strating with `strided_slice`. Initially, we only support fused stacked attention pattern starting with `split` in apache#14608. But with the help of apache#14583, we may have similar patterns starting with `strided_slice` as well.
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
| elif start_op == "strided_slice": | ||
| query_raw = is_op("relax.strided_slice")(stacked_qkv) | ||
| key_raw = is_op("relax.strided_slice")(stacked_qkv) | ||
| value_raw = is_op("relax.strided_slice")(stacked_qkv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will also need to check the begin/end in strides_slice if value has a different sequence length
I found that |
| k = R.reshape(qkv_tuple[1], [b, s, n, h]) | ||
| v = R.reshape(qkv_tuple[2], [b, s, n, h_v]) | ||
| elif op == "strided_slice": | ||
| qkv_tuple = R.split(qkv, [n * h, n * h * 2], axis=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qkv_tuple not used (I was confused)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is useless from copy and paste.
This PR is a follow up for apache#14608 and apache#14649. In this PR, we add the checks for the fused stacked attention patterns. So we only enable the fusion of `stacked_qkv` with `ndim=3` and the `split/strided_slice axis=2`.
* [Unity][BYOC] Add check for stacked attention patterns This PR is a follow up for #14608 and #14649. In this PR, we add the checks for the fused stacked attention patterns. So we only enable the fusion of `stacked_qkv` with `ndim=3` and the `split/strided_slice axis=2`. * check the order of strided_slice
This PR expands the support for fused stacked attention patterns strating with
strided_slice. Initially, we only support fused stacked attention pattern starting withsplitin #14608. But with the help of #14583, we may have similar patterns starting withstrided_sliceas well.