Skip to content

Conversation

@guberti
Copy link
Member

@guberti guberti commented Sep 1, 2022

Currently, Relay QNN uses its helper_no_fast_int8_hw_legalization to convert most int8 convolution and dense operations into int16 ones on Arm. This currently occurs on ARM chips except for v8.2a chips with dotprod support.

However, this behavior means that int8 operations are replaced with int16 ones on Cortex-M chips. On these chips int16 is substantially slower, as while it saves a few sign extension operations, it doubles the amount of memory loads we need to perform.

This PR changes when helper_no_fast_int8_hw_legalization is used on Arm, and instead makes not doing this replacement the standard. We will only do this replacement if we are on a chip with ASIMD support but without v8.2a and dotprod. This ensures that Cortex-M microcontrollers do not have int8 operations turned into int16 ones.

I have also verified that this does, in fact, improve performance for some common models. For example, MobileNet_v1_0.25 on the Cortex-M4 saw a 10% performance improvement, compared to before this change. Accuracy does not seem to be affected.

cc @alanmacd @gromero @mehrdadh

@guberti guberti changed the title Allow int8 operations for Cortex-M cores [microTVM] Allow int8 operations for Cortex-M cores Sep 1, 2022
Copy link
Member

@mehrdadh mehrdadh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guberti great work and LGTM!
I'll wait for others to take a look as well

cc @AndrewZhaoLuo @areusch

@areusch
Copy link
Contributor

areusch commented Sep 1, 2022

cc @ekalda @u99127 @leandron @Mousius could you guys have a look for architectural oversight? Likely keeping everything as int8 makes the most sense for all vN-m chipsets, but curious if you agree this is the right way to identify this subset. some schedules might apply only to DSP (for instance see #12448), but I think those schedules can be selected separately from disabling this conversion.

@github-actions github-actions bot requested a review from gromero September 2, 2022 09:51
Comment on lines 446 to 448
if use_int8_on_arm or is_fast_int8_on_arm() or is_cortexm_arm():
return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)
return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually the same as an issue awhile back (#8717 (comment)), instead of special casing with is_cortexm you should be able to instead use something like:

Suggested change
if use_int8_on_arm or is_fast_int8_on_arm() or is_cortexm_arm():
return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)
return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)
has_asimd = (is_aarch64_arm() or "+neon" in target.mattr)
if has_asimd and not is_fast_int8_on_arm():
return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)
return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)

Which means it will only do the casting when the specific architecture features are available (has_asimd will become target.features.has_asimd after #12454 so is fine temporarily)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense to me, and is definitely cleaner than special casing Cortex-M. To clarify though - are you suggesting removing the checks for is_depthwise and whether attrs["data_layout"] == "NHWC"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, I think we should keep the check as (not is_depthwise) and has_asimd and attrs["data_layout"] == "NHWC" for invoking the helper with the casting?

Copy link
Member Author

@guberti guberti Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Actually, I'm not sure I understand. Are you proposing we do the checks like this?

    if (
        (not is_depthwise) and has_asimd and attrs["data_layout"] == "NHWC"
        and (not is_fast_int8_on_arm())
    ):
        return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)
    return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)

I suspect I'm misunderstanding, as it seems attrs["data_layout"] == "NHWC" would make int8 a better choice (plus previously, having attrs["data_layout"] == "NHWC" made us more likely to use int8 operators). Would you mind clarifying @Mousius?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, my boolean logic was off methinks, as I remember it the logic should cast for ASIMD and opt out if there's another option which might work better:

    use_int8_on_arm = (not is_depthwise) and attrs["data_layout"] == "NHWC"
    has_dotprod = is_fast_int8_on_arm()
    other_options = use_int8_on_arm or has_dotprod
    if has_asimd() and not other_options:
        return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)
    return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)

Does that sound right to you? 😸

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that sounds good, and should make us pass the existing unit tests. I think we should be good to merge once the CI is green.

@guberti
Copy link
Member Author

guberti commented Sep 2, 2022

@Mousius I've changed when helper_no_fast_int8_hw_legalization to be what you suggested - please take another look!

@Mousius
Copy link
Member

Mousius commented Sep 2, 2022

@guberti can you update the PR title/description? 😸

@Mousius Mousius self-assigned this Sep 2, 2022
@guberti guberti changed the title [microTVM] Allow int8 operations for Cortex-M cores [Relay] Change when int8 operations are converted to int16 on Arm Sep 2, 2022
@guberti
Copy link
Member Author

guberti commented Sep 2, 2022

@guberti can you update the PR title/description? smile_cat

@Mousius fixed!

@guberti guberti force-pushed the skip-micro-legalization branch 3 times, most recently from d814863 to 86452b2 Compare September 3, 2022 08:24
Expand comment docstring

Adjust int16 conversion requirements

Adjust conversion requirements per code review
@guberti guberti force-pushed the skip-micro-legalization branch from 6785b27 to 7a4c52d Compare September 7, 2022 13:14
Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small patch, huge impact 😸

@Mousius Mousius merged commit cd99ca6 into apache:main Sep 8, 2022
guberti added a commit to guberti/tvm that referenced this pull request Sep 27, 2022
guberti added a commit to guberti/tvm that referenced this pull request Sep 27, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…ache#12671)

Currently, Relay QNN uses its `helper_no_fast_int8_hw_legalization` to convert most `int8` convolution and dense operations into `int16` ones on Arm. This currently occurs on ARM chips except for `v8.2a` chips with `dotprod` support.

However, this behavior means that `int8` operations are replaced with `int16` ones on Cortex-M chips. On these chips `int16` is substantially slower, as while it saves a few sign extension operations, it doubles the amount of memory loads we need to perform. 

This PR changes when `helper_no_fast_int8_hw_legalization` is used on Arm, and instead makes **not** doing this replacement the standard. We will only do this replacement if we are on a chip with ASIMD support but without `v8.2a` and `dotprod`. This ensures that Cortex-M microcontrollers do not have `int8` operations turned into `int16` ones.

I have also verified that this does, in fact, improve performance for some common models. For example, MobileNet_v1_0.25 on the Cortex-M4 saw a 10% performance improvement, compared to before this change. Accuracy does not seem to be affected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants