-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Summary
This issue will be used to track ONNX importer coverage progress of standard and non-standard quantized ops in TVM, and can be used to coordinate distributed efforts on improving quantized importer coverage work across organizations.
Status
To this day (Aug 24th 2021) we'd like to account for both standard ONNX quantized ops and non-standard quantized contrib ops introduced by ONNXRT, as shown in the table below:
Shortlist of ops that are emitted by ONNXRT static quantization (higher priority), based on https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/quantization/registry.py:
| Operator | Standard | Opset | Status | Owner | PR |
|---|---|---|---|---|---|
| ConvInteger | Y | 10 | Supported | @jwfromm | #8456 |
| DequantizeLinear | Y | 13, 10 | Supported | @mbrookhart | #7802 |
| MatMulInteger | Y | 10 | WIP | @WenheLI | |
| QLinearConv | Y | 10 | Supported | @huochaitiantang | #8007 |
| QLinearMatMul | Y | 10 | Supported | @cconvey | #8952 |
| QuantizeLinear | Y | 13, 10 | Supported | @mbrookhart | #7802 |
| com.microsoft.QAttention | N | n/a | TODO | ||
| com.microsoft.QLinearAdd | N | n/a | Supported | @mbrookhart | #8305 |
| com.microsoft.QLinearAveragePool | N | n/a | Supported | @quic-sanirudh | #9017 |
| com.microsoft.QLinearConcat | N | n/a | Supported | @anwang2009 | #8907 |
| com.microsoft.QLinearGlobalAveragePool | N | n/a | Supported | @quic-sanirudh | #9017 |
| com.microsoft.QLinearLeakyRelu | N | n/a | Supported | @gayatripk1 | #9063 |
| com.microsoft.QLinearMul | N | n/a | Supported | @anwang2009 | #8773 |
| com.microsoft.QLinearSigmoid | N | n/a | Supported | @arangasa | #9028 |
Ops for supporting ORT dynamic quantization:
| Operator | Standard | Opset | Status | Owner | PR |
|---|---|---|---|---|---|
| DynamicQuantizeLinear | Y | 11 | Supported | @mbrookhart | #7802 |
| com.microsoft.DynamicQuantizeLSTM | N | n/a | TODO | ||
| com.microsoft.DynamicQuantizeMatMul | N | n/a | WIP | @quic-sanirudh |
Other integer ops that might be relevant:
| Operator | Standard | Opset | Status | Owner | PR |
|---|---|---|---|---|---|
| com.microsoft.MatMulInteger16 | Y | n/a | Supported | @abhikran-quic | #9186 |
| com.microsoft.MatMulIntegerToFloat | N | n/a | WIP | @onkar-sima-ai | |
| com.microsoft.MulInteger | N | n/a | WIP | @tasmia-rahman | |
| com.microsoft.ReduceSumInteger | N | n/a | WIP | @FranckQC |
Other ops:
| Operator | Standard | Opset | Status | Owner | PR |
|---|---|---|---|---|---|
| com.microsoft.QGemm | N | n/a | WIP | @rasagna-quic | |
| com.microsoft.QLinearReduceMean | N | n/a | WIP | @avquicinc |
Coordination
Improving the importers can be a good onboarding task for engineers that would like to get a more in-depth exposure to the TVM stack. The goal is that if folks want to claim an operator they can feel reassured that their work won't be deprecated by work that is in flight.
We provide reference PRs that can serve as a template to adding a quantized standard op: #7802 by @mbrookhart. As well as non-standard op from the Microsoft ContribOperators set in ONNXRT: #8773 by @anwang2009.
Please comment in this issue if you'd like to add to Relay ONNX importer coverage so I can update the table.