[Relay] Change when int8 operations are converted to int16 on Arm #12671

guberti · 2022-09-01T11:46:24Z

Currently, Relay QNN uses its helper_no_fast_int8_hw_legalization to convert most int8 convolution and dense operations into int16 ones on Arm. This currently occurs on ARM chips except for v8.2a chips with dotprod support.

However, this behavior means that int8 operations are replaced with int16 ones on Cortex-M chips. On these chips int16 is substantially slower, as while it saves a few sign extension operations, it doubles the amount of memory loads we need to perform.

This PR changes when helper_no_fast_int8_hw_legalization is used on Arm, and instead makes not doing this replacement the standard. We will only do this replacement if we are on a chip with ASIMD support but without v8.2a and dotprod. This ensures that Cortex-M microcontrollers do not have int8 operations turned into int16 ones.

I have also verified that this does, in fact, improve performance for some common models. For example, MobileNet_v1_0.25 on the Cortex-M4 saw a 10% performance improvement, compared to before this change. Accuracy does not seem to be affected.

cc @alanmacd @gromero @mehrdadh

mehrdadh

@guberti great work and LGTM!
I'll wait for others to take a look as well

cc @AndrewZhaoLuo @areusch

python/tvm/relay/qnn/op/legalizations.py

areusch · 2022-09-01T23:19:57Z

cc @ekalda @u99127 @leandron @Mousius could you guys have a look for architectural oversight? Likely keeping everything as int8 makes the most sense for all vN-m chipsets, but curious if you agree this is the right way to identify this subset. some schedules might apply only to DSP (for instance see #12448), but I think those schedules can be selected separately from disabling this conversion.

python/tvm/relay/qnn/op/legalizations.py

Mousius · 2022-09-02T10:00:30Z

python/tvm/relay/qnn/op/legalizations.py

+    if use_int8_on_arm or is_fast_int8_on_arm() or is_cortexm_arm():
        return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)
    return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)


I think this is actually the same as an issue awhile back (#8717 (comment)), instead of special casing with is_cortexm you should be able to instead use something like:

Suggested change

if use_int8_on_arm or is_fast_int8_on_arm() or is_cortexm_arm():

return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)

return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)

has_asimd = (is_aarch64_arm() or "+neon" in target.mattr)

if has_asimd and not is_fast_int8_on_arm():

return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)

return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)

Which means it will only do the casting when the specific architecture features are available (has_asimd will become target.features.has_asimd after #12454 so is fine temporarily)

That makes a lot of sense to me, and is definitely cleaner than special casing Cortex-M. To clarify though - are you suggesting removing the checks for is_depthwise and whether attrs["data_layout"] == "NHWC"?

Good spot, I think we should keep the check as (not is_depthwise) and has_asimd and attrs["data_layout"] == "NHWC" for invoking the helper with the casting?

~~Sounds good to me.~~ Actually, I'm not sure I understand. Are you proposing we do the checks like this?

if ( (not is_depthwise) and has_asimd and attrs["data_layout"] == "NHWC" and (not is_fast_int8_on_arm()) ): return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d) return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)

I suspect I'm misunderstanding, as it seems attrs["data_layout"] == "NHWC" would make int8 a better choice (plus previously, having attrs["data_layout"] == "NHWC" made us more likely to use int8 operators). Would you mind clarifying @Mousius?

I think you're right, my boolean logic was off methinks, as I remember it the logic should cast for ASIMD and opt out if there's another option which might work better:

use_int8_on_arm = (not is_depthwise) and attrs["data_layout"] == "NHWC" has_dotprod = is_fast_int8_on_arm() other_options = use_int8_on_arm or has_dotprod if has_asimd() and not other_options: return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d) return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)

Does that sound right to you? 😸

Yep, that sounds good, and should make us pass the existing unit tests. I think we should be good to merge once the CI is green.

guberti · 2022-09-02T12:11:52Z

@Mousius I've changed when helper_no_fast_int8_hw_legalization to be what you suggested - please take another look!

Mousius · 2022-09-02T13:17:50Z

@guberti can you update the PR title/description? 😸

guberti · 2022-09-02T16:45:40Z

@guberti can you update the PR title/description? smile_cat

@Mousius fixed!

Expand comment docstring Adjust int16 conversion requirements Adjust conversion requirements per code review

Mousius

Small patch, huge impact 😸

… Arm (apache#12671)" This reverts commit cd99ca6.

…ache#12671) Currently, Relay QNN uses its `helper_no_fast_int8_hw_legalization` to convert most `int8` convolution and dense operations into `int16` ones on Arm. This currently occurs on ARM chips except for `v8.2a` chips with `dotprod` support. However, this behavior means that `int8` operations are replaced with `int16` ones on Cortex-M chips. On these chips `int16` is substantially slower, as while it saves a few sign extension operations, it doubles the amount of memory loads we need to perform. This PR changes when `helper_no_fast_int8_hw_legalization` is used on Arm, and instead makes **not** doing this replacement the standard. We will only do this replacement if we are on a chip with ASIMD support but without `v8.2a` and `dotprod`. This ensures that Cortex-M microcontrollers do not have `int8` operations turned into `int16` ones. I have also verified that this does, in fact, improve performance for some common models. For example, MobileNet_v1_0.25 on the Cortex-M4 saw a 10% performance improvement, compared to before this change. Accuracy does not seem to be affected.

guberti changed the title ~~Allow int8 operations for Cortex-M cores~~ [microTVM] Allow int8 operations for Cortex-M cores Sep 1, 2022

mehrdadh approved these changes Sep 1, 2022

View reviewed changes

guberti mentioned this pull request Sep 1, 2022

Add Arm DSP implementation of Depthwise Conv2D #12448

Merged

AndrewZhaoLuo reviewed Sep 1, 2022

View reviewed changes

python/tvm/relay/qnn/op/legalizations.py Outdated Show resolved Hide resolved

github-actions bot requested a review from gromero September 2, 2022 09:51

Mousius requested changes Sep 2, 2022

View reviewed changes

Mousius self-assigned this Sep 2, 2022

guberti changed the title ~~[microTVM] Allow int8 operations for Cortex-M cores~~ [Relay] Change when int8 operations are converted to int16 on Arm Sep 2, 2022

guberti force-pushed the skip-micro-legalization branch 3 times, most recently from d814863 to 86452b2 Compare September 3, 2022 08:24

guberti added 2 commits September 7, 2022 06:13

Allow int8 operations for Cortex-M cores

d65b3f4

Fix conversion requirements

7a4c52d

Expand comment docstring Adjust int16 conversion requirements Adjust conversion requirements per code review

guberti force-pushed the skip-micro-legalization branch from 6785b27 to 7a4c52d Compare September 7, 2022 13:14

Mousius approved these changes Sep 8, 2022

View reviewed changes

Mousius merged commit cd99ca6 into apache:main Sep 8, 2022

guberti added a commit to guberti/tvm that referenced this pull request Sep 27, 2022

Revert "[Relay] Change when int8 operations are converted to int16 on…

68d5fdf

… Arm (apache#12671)" This reverts commit cd99ca6.

guberti added a commit to guberti/tvm that referenced this pull request Sep 27, 2022

Revert "[Relay] Change when int8 operations are converted to int16 on…

8f26aa5

… Arm (apache#12671)" This reverts commit cd99ca6.

[Relay] Change when int8 operations are converted to int16 on Arm #12671

[Relay] Change when int8 operations are converted to int16 on Arm #12671

Uh oh!

Conversation

guberti commented Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mehrdadh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

areusch commented Sep 1, 2022

Uh oh!

Uh oh!

Mousius Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

guberti Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

Mousius Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

guberti Sep 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mousius Sep 7, 2022

Choose a reason for hiding this comment

Uh oh!

guberti Sep 7, 2022

Choose a reason for hiding this comment

Uh oh!

guberti commented Sep 2, 2022

Uh oh!

Mousius commented Sep 2, 2022

Uh oh!

guberti commented Sep 2, 2022

Uh oh!

Mousius left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

guberti commented Sep 1, 2022 •

edited

Loading

guberti Sep 2, 2022 •

edited

Loading