-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[TOPI] Add dp4a intrinsic to CUDA #1707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a4fa7f6 to
9eb96b9
Compare
9eb96b9 to
a17e5c7
Compare
| import tvm | ||
|
|
||
|
|
||
| def _intrin_dp4a_reduce(x_scope, y_scope, z_scope): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us put it under cuda/tensor_intrin.py and rename to dp4a
|
@tqchen I have renamed the filename and dp4a, please review. |
|
|
||
| Parameters | ||
| ---------- | ||
| x_scope: The storage scope of buffer for lhs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add data type, see https://docs.tvm.ai/contribute/document.html#document-python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to default everything to local
|
|
||
| Parameters | ||
| ---------- | ||
| x_scope: The storage scope of buffer for lhs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to default everything to local
|
Thanks @vinx13 this is now merged! |
Added dp4a intrinsic to TOPI, and refactored gemm_int8 recipe. And then I will send int8 conv2d using dp4a in the next PR.
cc @tqchen @merrymercy