Skip to content

Conversation

@apeskov
Copy link
Contributor

@apeskov apeskov commented Mar 20, 2023

Goal is to allow to apply intrinsics "q_multiply_shift" and "q_multiply_shift_per_axis" for vector type i32x128. Originally it supports only "i32x32" which is natively supported by platform (1024 bit vector).

Motivation
There are situation than we have to use vector size slightly more than supported by platform. As example consider sequence of element-wise operators: add -> q_multiply_shift -> cast. To achieve performance we have to squash it into one single loop (sch.compute_at(...)). First two operators would like to be vectorised with using data type "int32x32". last one cast operator want to use i32x128 as src and i8x128 as dst. As result we have to adapt all this operator to accept vector size "??x128" to successfully vectorise entire loop.

This change allows to achieve significant performance speedup for tuning tasks like conv -> add -> qnn.requantize -> cast_i8.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Mar 20, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@apeskov
Copy link
Contributor Author

apeskov commented Mar 20, 2023

@ibsidorenko FYI

@apeskov apeskov changed the title Adapt some hexagon intrinsics for high vector lanes [Hexagon] Adapt some intrinsics for high vector lanes Mar 20, 2023
apeskov added 2 commits March 20, 2023 19:19
Signed-off-by: Alexander Peskov <peskovnn@gmail.com>
Signed-off-by: Alexander Peskov <peskovnn@gmail.com>
@apeskov
Copy link
Contributor Author

apeskov commented Mar 21, 2023

@masahi @kparzysz-quic @jverma-quic Previously you reviewed patches like this. Could you please take a look on this one?

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please polish doc writing in general.

Will accept vector lanes equal orig_vec_lanes * 2**n for n in [0,1,2...]

Ada is equivalent of splitting input args to chunk with lanes equal orig_vec_lanes,
execution provided low_intrinsic for each of them and concatenate back.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ada?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please polish this sentence, it is broken.

Intrinsic implementation to adapt

intrinsic_lanes: int
Args lanes supported by provided intrinsic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arg lanes?

Signed-off-by: Alexander Peskov <peskovnn@gmail.com>
@masahi
Copy link
Member

masahi commented Mar 24, 2023

@tvm-bot rerun

2 similar comments
@ibsidorenko
Copy link
Contributor

@tvm-bot rerun

@ibsidorenko
Copy link
Contributor

@tvm-bot rerun

@masahi masahi merged commit 6c34361 into apache:main Mar 27, 2023
@apeskov apeskov deleted the ap/hex-vector-lanes-enhance branch March 27, 2023 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants