[QNN] [RFC] QNN Dialect -- Prequantize Models

We are proposing a new dialect named `QNN`, that introduces a quantized version of existing relay operators. The goal is to support the models that have been pre-quantized in the framework. 

Some important notes about QNN dialect are
* QNN operators are lowered to existing Relay operators to ensure that we can reuse Relay infrastructure.
* Code resides in new directory. Python files are in `python/relay/qnn` and CPP files are in `src/relay/qnn`.
* QNN, like any other dialect, introduces new Relay passes. These passes only deal with QNN ops (like lowering of QNN ops to existing Relay ops). For any generic optimization, we rely on existing Relay passes.

We can use this thread to discuss various open questions. Some of these questions can be
1) Code organization, namespaces, API discussion.
2) QNN operator lowering - Infrastructure, correct sequence of Relay operations etc.
3) Ways to efficiently add new operators with minimal engineering efforts.
4) Requirements (if any) of new generic Relay passes to achieve good performance.
5) Any new bugs that arise as we start testing integer computations more thoroughly.

The idea of QNN dialect was a result of discussion at Issue https://github.com/dmlc/tvm/issues/2351. Thanks @tqchen @FrozenGene @jackwish  @jnorwood @shoubhik for the discussions.

First few PRs for the QNN dialect
* Requantize operator - https://github.com/dmlc/tvm/pull/3531
* Quantize and Dequantize operator - https://github.com/dmlc/tvm/pull/3512

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN] [RFC] QNN Dialect -- Prequantize Models #3591

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QNN] [RFC] QNN Dialect -- Prequantize Models #3591

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions