-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[Hexagon] Adapt some intrinsics for high vector lanes #14345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
|
@ibsidorenko FYI |
Signed-off-by: Alexander Peskov <peskovnn@gmail.com>
Signed-off-by: Alexander Peskov <peskovnn@gmail.com>
|
@masahi @kparzysz-quic @jverma-quic Previously you reviewed patches like this. Could you please take a look on this one? |
masahi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please polish doc writing in general.
| Will accept vector lanes equal orig_vec_lanes * 2**n for n in [0,1,2...] | ||
|
|
||
| Ada is equivalent of splitting input args to chunk with lanes equal orig_vec_lanes, | ||
| execution provided low_intrinsic for each of them and concatenate back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ada?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please polish this sentence, it is broken.
| Intrinsic implementation to adapt | ||
|
|
||
| intrinsic_lanes: int | ||
| Args lanes supported by provided intrinsic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arg lanes?
|
@tvm-bot rerun |
Goal is to allow to apply intrinsics "q_multiply_shift" and "q_multiply_shift_per_axis" for vector type i32x128. Originally it supports only "i32x32" which is natively supported by platform (1024 bit vector).
Motivation
There are situation than we have to use vector size slightly more than supported by platform. As example consider sequence of element-wise operators: add -> q_multiply_shift -> cast. To achieve performance we have to squash it into one single loop (
sch.compute_at(...)). First two operators would like to be vectorised with using data type "int32x32". last one cast operator want to use i32x128 as src and i8x128 as dst. As result we have to adapt all this operator to accept vector size "??x128" to successfully vectorise entire loop.This change allows to achieve significant performance speedup for tuning tasks like
conv -> add -> qnn.requantize -> cast_i8.