Skip to content

Conversation

@huochaitiantang
Copy link
Contributor

For the import of pre-quantized ONNX models, nodes (like Conv, Gemm, Add, Mul, Sub) between QuantizeLinear and DequantizeLinear should be quantized.

This PR is summarized as follows:
1, Collect quantize params for nodes to be quantized, which locate between QuantizeLinear and DequantizeLinear.
2, Generate corresponding qnn ops (like qnn.conv2d, qnn.dense, qnn.add, qnn.mul, qnn.subtract) for nodes that can be quantized.

@jwfromm jwfromm requested review from jwfromm and mbrookhart April 28, 2021 15:57
@mbrookhart
Copy link
Contributor

This is awesome, thanks for the submission!

It looks like the ONNX and ORT we have in CI might be too out of date to support this? Could you point me at the documentation for this behavior? I wasn't aware ONNX did this, I thought they primarily used ops like ConvInteger? I'm hoping to add support for importing those ops in the coming weeks.

Is there a more recent version of ORT that supports this? If so, could you add the test line and comment it out with a TODO for renabling once we update to some minimum ORT? I'm hoping to update ORT and ONNX versions in CI in the next month or two.

Thanks!

@huochaitiantang
Copy link
Contributor Author

@mbrookhart Thanks for your comment!
I have checked the supported input types of operators in ORT:
https://github.com/microsoft/onnxruntime/blob/master/docs/OperatorKernels.md

Conv: tensor(float)
Gemm: tensor(double), tensor(float)
Add, Mul, Sub: tensor(double), tensor(float), tensor(int32), tensor(int64)

It shows that the above operators cannot accept int8 tensor in ORT, so these operators between QuantizeLinear and DequantizeLinear cannot run in ORT, even if they should be quantized.

My question is that, even if ORT cannot run these models successfully, should TVM support the import of them and generate correct qnn ops? It will determine whether this PR is necessary.

In addition, ORT quantization operators like ConvInteger and QLinearConv can be run successfully with uint8 tensors as input. It may be natural to import these operators later.

@mbrookhart
Copy link
Contributor

Hi @huochaitiantang ,

That all makes sense, I'm really just wondering if we've ever seen an example of this in the wild, i.e. q->conv->dq instead of q->convinteger->dq. Do you have an example model defined this way somewhere? I mostly just want to make sure we don't implement a feature that's out of spec.

Thanks,
Matthew

@jwfromm
Copy link
Contributor

jwfromm commented Apr 29, 2021

Yeah i just wanted to echo @mbrookhart that it would be great to include a test that operates on a real quantized model. Ideally you could start at a prequantized tf or pytorch model, export it to onnx, then import that using the changes in this PR.

@huochaitiantang
Copy link
Contributor Author

Hi, @mbrookhart @jwfromm Thanks for your advice.

We have tried to export a real pre-quantized ONNX model from popular frameworks. But it seems difficult.

  • The pytorch fails to export quantized model to ONNX because of the error: AttributeError: 'torch.dtype' object has no attribute 'detach'. Please see https://discuss.pytorch.org/t/onnx-export-failed-int8-model/59754/17
  • The pre-quantized ONNX model exported by tflite is weird, which is not the real quantized model.
  • The pre-quantized ONNX model by onnxruntime looks good, it contains QuantizeLinear -> QLinearConv -> DequantizeLinear, so we submit a new PR to support the QLinearConv: [ONNX] QLinearConv Support #8007.

The pattern QuantizeLinear -> Conv -> DequantizeLinear may not appear in pre-quantized ONNX models exported by pytorch, tflite, or onnxruntime. So we can close this PR.

@mbrookhart
Copy link
Contributor

Sounds good. We found the "fake quantization" in the tflite exports a couple of weeks ago, I'm currently working on a pass to convert that into QNN after import. I'll close this, but thanks for the QLinearConv PR!

@mbrookhart mbrookhart closed this May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants