Skip to content

[QNN EP] Add Case-2 LPBQ pattern support for Gemm and Matmul nodes#25865

Merged
edgchen1 merged 4 commits intomicrosoft:mainfrom
CodeLinaro:dev/tirupath/lpbq_case2_pattern_support
Jan 9, 2026
Merged

[QNN EP] Add Case-2 LPBQ pattern support for Gemm and Matmul nodes#25865
edgchen1 merged 4 commits intomicrosoft:mainfrom
CodeLinaro:dev/tirupath/lpbq_case2_pattern_support

Conversation

@quic-tirupath
Copy link
Contributor

Description

  • Case-2 LPBQ pattern omits QuantizeLinear node in LPBQ packing pattern
  • Modify LPBQ fusion logic in QNN EP implemented for Gemma and MatMul nodes to gracefully handle the optional QuantizeLinear node in LPBQ packing pattern.
  • Add unit tests to verify Case-2 LPBQ pattern fusion for Gemm and MatMul nodes.

Motivation and Context

  • QuantizeLinear node in LowPowerBlockQuantization encoding packing pattern can be optional as it helps to keep the weights in INT datatype and further helps to reduce the size of model.

@jywu-msft jywu-msft requested a review from Copilot August 27, 2025 16:03
@jywu-msft
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Case-2 LPBQ (Low Power Block Quantization) pattern where the QuantizeLinear node is optional in the LPBQ packing pattern for Gemm and MatMul nodes. This enables keeping weights in INT datatype to reduce model size.

Key changes:

  • Modified LPBQ fusion logic to handle optional QuantizeLinear nodes
  • Added new unit tests for both MatMul and Gemm Case-2 LPBQ patterns
  • Extended utility functions to support lookup by input name

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lpbqmatmul_fusion_without_ql_test.cc Unit test for MatMul LPBQ fusion without QuantizeLinear
lpbqgemm_fusion_without_ql_test.cc Unit test for Gemm LPBQ fusion without QuantizeLinear
utils.h/.cc Added GetParentOfInputByName utility function
lpbqmatmul_fusion.h/.cc Modified to support optional QuantizeLinear nodes
lpbqgemm_fusion.h/.cc Modified to support optional QuantizeLinear nodes
Comments suppressed due to low confidence (2)

onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqmatmul_fusion.cc:142

  • [nitpick] Magic number index 2 is used to access the target node unit. Consider using a named constant or enum to make the code more self-documenting and less error-prone.
  return node_units_[2];

onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.cc:187

  • [nitpick] Magic number index 4 is used to access the target node unit. Consider using a named constant or enum to make the code more self-documenting and less error-prone.
  return node_units_[4];

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@HectorSVC HectorSVC added the ep:QNN issues related to QNN exeution provider label Aug 27, 2025
jywu-msft
jywu-msft previously approved these changes Sep 3, 2025
Copy link
Contributor

@minfhong-qti minfhong-qti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const NodeUnitIODef& per_channel_float_def = scale_dql_node_unit.Inputs()[0];

Should it be

const NodeUnitIODef& per_channel_float_def = scale_dql_node_unit.Inputs()[1]

instead?

@jywu-msft
Copy link
Member

const NodeUnitIODef& per_channel_float_def = scale_dql_node_unit.Inputs()[0];

Should it be

const NodeUnitIODef& per_channel_float_def = scale_dql_node_unit.Inputs()[1]

instead?

@quic-tirupath ?

@quic-tirupath
Copy link
Contributor Author

const NodeUnitIODef& per_channel_float_def = scale_dql_node_unit.Inputs()[0];

Should it be

const NodeUnitIODef& per_channel_float_def = scale_dql_node_unit.Inputs()[1]

instead?

Nope; This is correct. The DQL node_unit contains only one input which is input[0] and other inputs like input[1], input[2] are wrapped as quantization params of input[0]. Here we first get the input[0] IODef, and then get the per_channel_float values from its quantization param: scale.

quic-tirupath and others added 3 commits December 22, 2025 12:35
 - Case-2 LPBQ pattern omits QuantizeLinear node in LPBQ packing pattern
 - Modify LPBQ fusion logic in QNN EP implemented for Gemma and MatMul
   nodes to gracefully handle the optional QuantizeLinear node in LPBQ
   packing pattern.
 - Add unit tests to verify Case-2 LPBQ pattern fusion for Gemm and
   MatMul nodes.
 - Fix review comments
 - Rebase the PR on tip and address the conflicts
@tirupath-qti tirupath-qti force-pushed the dev/tirupath/lpbq_case2_pattern_support branch from 7cf9b91 to b1041ce Compare December 22, 2025 20:37
@tirupath-qti
Copy link
Contributor

Hi @edgchen1
Thanks for providing the review comments. I added my responses to all your comments and address many of them in the new commit added to this PR.

Could you please help to review and approve the latest PR version? Also, please help trigger CI job and merge after the successful review.

@edgchen1
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link
Contributor

@tirupath-qti tirupath-qti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the review comments in a new commit added to this PR.

@edgchen1
Copy link
Contributor

edgchen1 commented Jan 8, 2026

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@edgchen1 edgchen1 merged commit c1da27c into microsoft:main Jan 9, 2026
89 checks passed
alex-spacemit pushed a commit to spacemit-com/onnxruntime that referenced this pull request Jan 20, 2026
…icrosoft#25865)

### Description

 - Case-2 LPBQ pattern omits QuantizeLinear node in LPBQ packing pattern
- Modify LPBQ fusion logic in QNN EP implemented for Gemma and MatMul
nodes to gracefully handle the optional QuantizeLinear node in LPBQ
packing pattern.
- Add unit tests to verify Case-2 LPBQ pattern fusion for Gemm and
MatMul nodes.



### Motivation and Context
- QuantizeLinear node in LowPowerBlockQuantization encoding packing
pattern can be optional as it helps to keep the weights in INT datatype
and further helps to reduce the size of model.

---------

Co-authored-by: tirupath-qti <tirupath@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:QNN issues related to QNN exeution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants