[Relay][Hexagon] Add per-channel FixedPointMultiply operation #13080

ibsidorenko · 2022-10-14T13:48:32Z

Main goal of this commit is to improve performance for Hexagon target and preserve performance/accuracy for x86, GPU and etc. targets.

"qnn.requantize" operation is lowered into the sequence of multiply, add, shift during QNN canonicalization pass if scale quantization parameter is vector of scalars. This commit adds new Relay per-channel/per-axis FixedPointMultiply operation and is used in "qnn.requantize" operation lowering.

per-channel/per-axis FixedPointMultiply is implemented through tir.q_multiply_shift_per_axis intrinsic. For Hexagon target it overrides default implementation and generate HVX vmpye/vmpyo instruction (see _q_multiply_shift_per_axis_hexagon). For all other targets it uses default implementation (64 bits arithmetic).

Performance/accuracy measurement:

CPU(x86) target: accuracy and performance are the same. For other targets should be the same (otherwise it is bug).
Hexagon target: speedup of qnn.requantize 7x-9x times (Snapdragon 888, 4.4 ms -> 0.5 ms)

tvm-bot · 2022-10-14T13:48:35Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @ehsanmok, @mehrdadh _{See #10317 for details}
Built docs for commit 62309a6 can be found here.

_{Generated by tvm-bot}

masahi · 2022-10-14T19:06:46Z

@tvm-bot rerun

masahi · 2022-10-14T19:37:19Z

python/tvm/topi/hexagon/tensor_intrin.py

+    right shift. This is because we are rounding twice instead than only once. I.e.:
+
+        * original q_multiply_shift: round(x*y*2^-s)
+        * hexagon q_multiply_shift: round(round(x*y)*2^-s)


cc @kparzysz-quic @jverma-quic on this HVX implementation.

I will add test to demonstrate issue with accuracy.

I have fixed issue with accuracy drop.
For the case when we need both right and left shifts at the same time I use "old" approach and lower this oper to the sequence left_shift/multipy/add/right_shift (64bit arithmetic). Right now I have no idea how to implement this case through vector HVX instructions without accuracy drop.

ibsidorenko · 2022-10-24T07:39:55Z

@tvm-bot rerun

Main goal of this commit is to improve performance for Hexagon target and preserve performance/accuracy for x86, GPU and etc. targets. "qnn.requantize" operation is lowered into the sequence of multiply, add, shift during QNN canonicalization pass if scale quantization parameter is the vector of scalars. This commit adds new Relay per-channel/per-axis FixedPointMultiply operation and is used in "qnn.requantize" operation lowering. per-channel/per-axis FixedPointMultiply is implemented through tir.q_multiply_shift_per_axis intrinsic. For Hexagon target it overrides default implementation and generates HVX vmpye/vmpyo instruction (see _q_multiply_shift_per_axis_hexagon). For all other targets it uses default implementation (64 bits arithmetic). Performance/accuracy measurement: CPU(x86) target: accuracy and performance are the same. For other targets should be the same (otherwise it is bug). Hexagon target: speedup of qnn.requantize 7x-9x times (Snapdragon 888, 3.08 ms -> 0.39 ms)

masahi · 2022-10-26T06:28:13Z

src/target/intrin_rule.cc

+      PrimExpr right_shift = call->args[3];
+      PrimExpr q = call->args[4];
+      PrimExpr is_lshift_required = call->args[5];
+      // Note, 7th argument is "is_rshift_required" flag, but we do need that here.


You mean "don't need"?

Oh... yes, exactly. My bad, this is typo in comment.

masahi

@kparzysz-quic This PR improves performance on int8 resnet50 from the PR #12911 while preserving accuracy.

Manual schedules (no tuning): 146 msec (before) -> 92 msec.
Tuned schedules (vrmpy auto tensorization): 105 msec -> 58 msec.

Very cool!

cc @tmoreau89 @csullivan

…#13080) * [Relay][Hexagon] Add per-channel FixedPointMultiply operation Main goal of this commit is to improve performance for Hexagon target and preserve performance/accuracy for x86, GPU and etc. targets. "qnn.requantize" operation is lowered into the sequence of multiply, add, shift during QNN canonicalization pass if scale quantization parameter is the vector of scalars. This commit adds new Relay per-channel/per-axis FixedPointMultiply operation and is used in "qnn.requantize" operation lowering. per-channel/per-axis FixedPointMultiply is implemented through tir.q_multiply_shift_per_axis intrinsic. For Hexagon target it overrides default implementation and generates HVX vmpye/vmpyo instruction (see _q_multiply_shift_per_axis_hexagon). For all other targets it uses default implementation (64 bits arithmetic). Performance/accuracy measurement: CPU(x86) target: accuracy and performance are the same. For other targets should be the same (otherwise it is bug). Hexagon target: speedup of qnn.requantize 7x-9x times (Snapdragon 888, 3.08 ms -> 0.39 ms) * Address code review comments

ibsidorenko force-pushed the fpm-per-channel branch from 7f61048 to c0c66c8 Compare October 14, 2022 15:46

ibsidorenko changed the title ~~[Relay][Hexagon] Added per-channel FixedPointMultiply operation.~~ [Relay][Hexagon] Add per-channel FixedPointMultiply operation Oct 14, 2022

masahi reviewed Oct 14, 2022

View reviewed changes

areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022

ibsidorenko force-pushed the fpm-per-channel branch from c0c66c8 to 123ad38 Compare October 21, 2022 14:51

ibsidorenko force-pushed the fpm-per-channel branch from 123ad38 to 483a1be Compare October 25, 2022 07:34

ibsidorenko requested a review from masahi October 25, 2022 16:08

masahi reviewed Oct 26, 2022

View reviewed changes

masahi approved these changes Oct 26, 2022

View reviewed changes

Address code review comments

62309a6

masahi merged commit 645a5ea into apache:main Oct 27, 2022

leandron mentioned this pull request Feb 1, 2023

TVM v0.11.0 Release Candidate Notes #13899

Closed

ibsidorenko deleted the fpm-per-channel branch March 29, 2023 06:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay][Hexagon] Add per-channel FixedPointMultiply operation #13080

[Relay][Hexagon] Add per-channel FixedPointMultiply operation #13080

Uh oh!

ibsidorenko commented Oct 14, 2022 •

edited

Loading

Uh oh!

tvm-bot commented Oct 14, 2022 •

edited

Loading

Uh oh!

masahi commented Oct 14, 2022

Uh oh!

masahi Oct 14, 2022

Uh oh!

ibsidorenko Oct 17, 2022

Uh oh!

ibsidorenko Oct 24, 2022

Uh oh!

ibsidorenko commented Oct 24, 2022

Uh oh!

masahi Oct 26, 2022

Uh oh!

ibsidorenko Oct 26, 2022

Uh oh!

masahi left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Relay][Hexagon] Add per-channel FixedPointMultiply operation #13080

[Relay][Hexagon] Add per-channel FixedPointMultiply operation #13080

Uh oh!

Conversation

ibsidorenko commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tvm-bot commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

masahi commented Oct 14, 2022

Uh oh!

masahi Oct 14, 2022

Choose a reason for hiding this comment

Uh oh!

ibsidorenko Oct 17, 2022

Choose a reason for hiding this comment

Uh oh!

ibsidorenko Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

ibsidorenko commented Oct 24, 2022

Uh oh!

masahi Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

ibsidorenko Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

masahi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ibsidorenko commented Oct 14, 2022 •

edited

Loading

tvm-bot commented Oct 14, 2022 •

edited

Loading

masahi left a comment •

edited

Loading