-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[FQ2I] Support converting dense -> add to qnn.dense -> add -> requantize
#13578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
Icemist
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a little code remark.
python/tvm/relay/frontend/onnx.py
Outdated
| # Y = alpha * A * B + beta * C | ||
| alpha = float(attr.get("alpha", 1.0)) | ||
| beta = float(attr.get("beta", 1.0)) | ||
| beta = float(attr.get("beta")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the original line of .get('beta', 1.0) since you cannot call float() on None which attr.get can return.
Then below on L1409, you can just do if beta is None --> if 'beta' not in attr.keys() or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though the change on L1409 might not be needed since if beta == 1, it can be removed with constant folding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant folding doesn't work when beta is multiplying an output of qnn ops, since we cannot fold over them. The model in #13545 has multiply(1f, dequantize(bias) after dense, which was also causing some issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved float(beta) to the else block of if beta is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the whole purpose of this change was to avoid multiplying by 1.0, since multiply(1f, dequantize(bias) would be converted to qnn.mul(quantize(1), bias) by FQ2I. So I restored the original code cc @Icemist
An alternative would be to add algebraic simplification to the SimpliyfyExpr pass.
d411b86 to
da99fa5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks, though i will wait on @Icemist
bd6e2d2 to
9c8d26e
Compare
… `requantize` (apache#13578) * wip * hack to convert size-1 scale and zp tensors to scalar * fix to binary op fast path * check output zp * add assert * add comment * lint * clean up beta handling * use regular binary op only for 32 bit add (bias addition) * do float(beta) when we know that beta is not None * restore original beta handling code to avoid mul by 1 * add comment on overflow
… `requantize` (apache#13578) * wip * hack to convert size-1 scale and zp tensors to scalar * fix to binary op fast path * check output zp * add assert * add comment * lint * clean up beta handling * use regular binary op only for 32 bit add (bias addition) * do float(beta) when we know that beta is not None * restore original beta handling code to avoid mul by 1 * add comment on overflow
Closes #13545
The pattern of
dense -> add, where the add is really bias addition, can appear often as the result of converting ONNXGemmop:tvm/python/tvm/relay/frontend/onnx.py
Line 1409 in edfeba5
Currently, FQ2I tries to convert this
addtoqnn.add. But if this add is being used for bias addition,out_t.scaleandout_t.zero_pointvariables infake_quantization_to_integer.py, which are used to initialize the output scale and zp of the QNN binary operators, can be tensors rather than scalars. QNN binary operators do not support such output qparams, which led to the error reported in #13545.For this reason, apparently we haven't supported converting
dense -> addtoqnn.dense -> add -> requantize, whenaddis a bias add, in FQ2I. The pattern ofdense -> nn.bias_addcan be converted toqnn.dense -> nn.bias_add -> requantize, but we never usenn.bias_addafterdense.So I added a code path in the FQ2I QNN binary op converter, to identify such patterns and use regular binary ops rather than QNN ones.
cc @AndrewZhaoLuo @Icemist @elvin-n