Skip to content

[QNN] [RFC] QNN Dialect -- Prequantize Models #3591

@anijain2305

Description

@anijain2305

We are proposing a new dialect named QNN, that introduces a quantized version of existing relay operators. The goal is to support the models that have been pre-quantized in the framework.

Some important notes about QNN dialect are

  • QNN operators are lowered to existing Relay operators to ensure that we can reuse Relay infrastructure.
  • Code resides in new directory. Python files are in python/relay/qnn and CPP files are in src/relay/qnn.
  • QNN, like any other dialect, introduces new Relay passes. These passes only deal with QNN ops (like lowering of QNN ops to existing Relay ops). For any generic optimization, we rely on existing Relay passes.

We can use this thread to discuss various open questions. Some of these questions can be

  1. Code organization, namespaces, API discussion.
  2. QNN operator lowering - Infrastructure, correct sequence of Relay operations etc.
  3. Ways to efficiently add new operators with minimal engineering efforts.
  4. Requirements (if any) of new generic Relay passes to achieve good performance.
  5. Any new bugs that arise as we start testing integer computations more thoroughly.

The idea of QNN dialect was a result of discussion at Issue #2351. Thanks @tqchen @FrozenGene @jackwish @jnorwood @shoubhik for the discussions.

First few PRs for the QNN dialect

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions