-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[Relay] Change when int8 operations are converted to int16 on Arm #12671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mehrdadh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guberti great work and LGTM!
I'll wait for others to take a look as well
|
cc @ekalda @u99127 @leandron @Mousius could you guys have a look for architectural oversight? Likely keeping everything as int8 makes the most sense for all vN-m chipsets, but curious if you agree this is the right way to identify this subset. some schedules might apply only to DSP (for instance see #12448), but I think those schedules can be selected separately from disabling this conversion. |
| if use_int8_on_arm or is_fast_int8_on_arm() or is_cortexm_arm(): | ||
| return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d) | ||
| return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually the same as an issue awhile back (#8717 (comment)), instead of special casing with is_cortexm you should be able to instead use something like:
| if use_int8_on_arm or is_fast_int8_on_arm() or is_cortexm_arm(): | |
| return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d) | |
| return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d) | |
| has_asimd = (is_aarch64_arm() or "+neon" in target.mattr) | |
| if has_asimd and not is_fast_int8_on_arm(): | |
| return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d) | |
| return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d) |
Which means it will only do the casting when the specific architecture features are available (has_asimd will become target.features.has_asimd after #12454 so is fine temporarily)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a lot of sense to me, and is definitely cleaner than special casing Cortex-M. To clarify though - are you suggesting removing the checks for is_depthwise and whether attrs["data_layout"] == "NHWC"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, I think we should keep the check as (not is_depthwise) and has_asimd and attrs["data_layout"] == "NHWC" for invoking the helper with the casting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Actually, I'm not sure I understand. Are you proposing we do the checks like this?
if (
(not is_depthwise) and has_asimd and attrs["data_layout"] == "NHWC"
and (not is_fast_int8_on_arm())
):
return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)
return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)I suspect I'm misunderstanding, as it seems attrs["data_layout"] == "NHWC" would make int8 a better choice (plus previously, having attrs["data_layout"] == "NHWC" made us more likely to use int8 operators). Would you mind clarifying @Mousius?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, my boolean logic was off methinks, as I remember it the logic should cast for ASIMD and opt out if there's another option which might work better:
use_int8_on_arm = (not is_depthwise) and attrs["data_layout"] == "NHWC"
has_dotprod = is_fast_int8_on_arm()
other_options = use_int8_on_arm or has_dotprod
if has_asimd() and not other_options:
return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)
return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)
Does that sound right to you? 😸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that sounds good, and should make us pass the existing unit tests. I think we should be good to merge once the CI is green.
|
@Mousius I've changed when |
|
@guberti can you update the PR title/description? 😸 |
d814863 to
86452b2
Compare
Expand comment docstring Adjust int16 conversion requirements Adjust conversion requirements per code review
6785b27 to
7a4c52d
Compare
Mousius
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small patch, huge impact 😸
… Arm (apache#12671)" This reverts commit cd99ca6.
… Arm (apache#12671)" This reverts commit cd99ca6.
…ache#12671) Currently, Relay QNN uses its `helper_no_fast_int8_hw_legalization` to convert most `int8` convolution and dense operations into `int16` ones on Arm. This currently occurs on ARM chips except for `v8.2a` chips with `dotprod` support. However, this behavior means that `int8` operations are replaced with `int16` ones on Cortex-M chips. On these chips `int16` is substantially slower, as while it saves a few sign extension operations, it doubles the amount of memory loads we need to perform. This PR changes when `helper_no_fast_int8_hw_legalization` is used on Arm, and instead makes **not** doing this replacement the standard. We will only do this replacement if we are on a chip with ASIMD support but without `v8.2a` and `dotprod`. This ensures that Cortex-M microcontrollers do not have `int8` operations turned into `int16` ones. I have also verified that this does, in fact, improve performance for some common models. For example, MobileNet_v1_0.25 on the Cortex-M4 saw a 10% performance improvement, compared to before this change. Accuracy does not seem to be affected.
Currently, Relay QNN uses its
helper_no_fast_int8_hw_legalizationto convert mostint8convolution and dense operations intoint16ones on Arm. This currently occurs on ARM chips except forv8.2achips withdotprodsupport.However, this behavior means that
int8operations are replaced withint16ones on Cortex-M chips. On these chipsint16is substantially slower, as while it saves a few sign extension operations, it doubles the amount of memory loads we need to perform.This PR changes when
helper_no_fast_int8_hw_legalizationis used on Arm, and instead makes not doing this replacement the standard. We will only do this replacement if we are on a chip with ASIMD support but withoutv8.2aanddotprod. This ensures that Cortex-M microcontrollers do not haveint8operations turned intoint16ones.I have also verified that this does, in fact, improve performance for some common models. For example, MobileNet_v1_0.25 on the Cortex-M4 saw a 10% performance improvement, compared to before this change. Accuracy does not seem to be affected.
cc @alanmacd @gromero @mehrdadh