-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[QNN] Add operator #3736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN] Add operator #3736
Conversation
|
This needs a rebase and squash and some description to go with the pull request. |
u99127
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right place to finesse the (lhs_scale == rhs_scale and lhs_z_p == rhs_z_p) or is this caught elsewhere ?
I suspect one could end up with one less requantize step as this is just
{code}
output = relay.add (lhs, rhs);
return requantize (output , ....)
{code}
python/tvm/relay/qnn/op/qnn.py
Outdated
| output_zero_point, | ||
| rounding="TONEAREST", | ||
| out_dtype="int8"): | ||
| out_dtype='int8'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really an unrelated change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was unintended change. I will be consistent in using single or double quotes.
Aah, very nice observation. This should be caught here inside the |
|
@u99127 While working on your comment, I realized that I am not certain the lowering is correct. Give me a couple of days to dig deeper into the TFLite codebase to see what they do. I am not sure if I am handling zero points correctly. |
No worries, I had another review comment that I missed publishing around testing more cases than just with zero zero_points which is what the tests seem to be doing. |
zhenhuaw-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several concerns, not necessarily block the merging.
python/tvm/relay/qnn/op/qnn.py
Outdated
|
|
||
| if lhs_scale == rhs_scale and lhs_zero_point == rhs_zero_point: | ||
| out = relay.add(lhs, rhs) | ||
| out = relay.subtract(out, relay.const(lhs_zero_point, dtype=in_dtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subtracting one zero point and letting requantize handle another one is a bit tricky? I know it were to avoid the requantize sometime though...
python/tvm/relay/qnn/op/qnn.py
Outdated
| out_dtype=in_dtype) | ||
|
|
||
| out = relay.add(requantized_lhs, requantized_rhs) | ||
| out = relay.subtract(out, relay.const(output_zero_point, dtype=in_dtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output casting concern.
|
@jackwish Addressed your comments by going to a int32 for addition and then casting back when necessary. Also, added test cases. |
zhenhuaw-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
As I am not community reviewer, you may need someone else to approve, my comments are only comments. :)
python/tvm/relay/qnn/op/qnn.py
Outdated
| # output qnn params. The add op is done in int32 precision. | ||
|
|
||
| if lhs_scale == rhs_scale and lhs_zero_point == rhs_zero_point: | ||
| lhs = relay.cast(lhs, dtype='int32') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add/sub in int16 should be enough, but not a big deal :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried that :) Currently, it fails before requantize input can only be (uint8, int8, int32). I think for now, it should be ok. If we see more demand of int16, we can add support across all the QNN ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is far from a blocking issue :) Thank you.
| y_datas = [np.array((204, 178, 165, 140)).reshape((1,4)), | ||
| np.array((204, 178, 191, 25)).reshape((1,4)), | ||
| np.array((204, 178, 25, 191)).reshape((1,4))] | ||
| golden_outputs = [np.array((217,204,203,191)).reshape((1, 4)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit curious, are these data coming from TFLite computing results, or manually computed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFLite adds quantize and dequantize. I am trapping the numbers after they have been quantized and before they are getting dequantized. So, these are from TFLite but not the actual GTest that you see, but the internals of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you for the explanation.
64311a1 to
c4d3199
Compare
|
It seems that the normalisation of the quantisation parameters would be the same for a number of different operators. If that is the case, does it make sense to factor this out and maybe put this in a pass? That would avoid having to implement this for each operator. |
@Leo-arm this is a very good point. For background, there are 2 parallel efforts in TVM community right now
We can share the HW schedules between these two options. What you suggested is almost what happens today in the Automatic quantization project. It is also somewhat easier there because Automatic quantization only works with symmetric quantization. For doing this in pre-quantized models, it is somewhat tricky because it happens on an op-by-op basis ( |
|
Assume below that ip0, ip1 and op are all 8 bit tensors with identical zero points and all the other cases. {code} now gets lowered into : {code} Am I right in assuming that the tflite parser directly lowers to this level ? Is there any reason why the alternate option of having 8 bit tensor operations in relay has been ignored ? |
|
|
Do not submit yet. Will move the codebase to C++ to avoid calling InferType. |
d85a4aa to
b6a679d
Compare
|
Moved to C++. Removed WIP tag |
zhenhuaw-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed re-checking, I have been working on some other directions. Seems rebase is needed as #3819 has been merged :)
| }; | ||
|
|
||
| /*! \brief Attributes used in QNN concatenate operators */ | ||
| struct QnnConcatenateAttrs : public tvm::AttrsNode<QnnConcatenateAttrs> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this part code has been merged in #3819
2a60ea6 to
65c2eec
Compare
|
@jackwish Can you please review? I have rebased to master. |
2dcdb05 to
a1e0fc2
Compare
zhiics
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zhenhuaw-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM generally, minor comments which won't block merging :)
src/relay/qnn/op/add.cc
Outdated
|
|
||
| // FIXME (anijain2305) - The lowering can be further optimized. Instead of inserting requantize in | ||
| // the start, we can insert requantize at the end if and only if all the input tensors have same | ||
| // qnn params. This can be done in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that same scale can lead to requantize after ADD, zero point can be safely subtracted :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me change the comment.
src/relay/qnn/op/add.cc
Outdated
| } | ||
|
|
||
| // Upcast to maintain precision. | ||
| requantized_lhs = Cast(requantized_lhs, Int(32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that result of two int8 subtracting can be hold in int16? But not big deal :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently Requantize does not support Int16. So, we can skip it for now. If we see int16 need later on, we can start supporting it across all ops.
zhiics
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks @anijain2305 @jackwish @u99127, this is now merged. |
Adding QNN Add operator.
The inputs to QNN Add operator can have different scales and zero points. This PR adds a QNN Add operator that first requantizes the inputs to output scale and shift and then call relay.add. This approach is also used by TF.