[QNN EP] Add Case-2 LPBQ pattern support for Gemm and Matmul nodes#25865
Conversation
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for Case-2 LPBQ (Low Power Block Quantization) pattern where the QuantizeLinear node is optional in the LPBQ packing pattern for Gemm and MatMul nodes. This enables keeping weights in INT datatype to reduce model size.
Key changes:
- Modified LPBQ fusion logic to handle optional QuantizeLinear nodes
- Added new unit tests for both MatMul and Gemm Case-2 LPBQ patterns
- Extended utility functions to support lookup by input name
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
lpbqmatmul_fusion_without_ql_test.cc |
Unit test for MatMul LPBQ fusion without QuantizeLinear |
lpbqgemm_fusion_without_ql_test.cc |
Unit test for Gemm LPBQ fusion without QuantizeLinear |
utils.h/.cc |
Added GetParentOfInputByName utility function |
lpbqmatmul_fusion.h/.cc |
Modified to support optional QuantizeLinear nodes |
lpbqgemm_fusion.h/.cc |
Modified to support optional QuantizeLinear nodes |
Comments suppressed due to low confidence (2)
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqmatmul_fusion.cc:142
- [nitpick] Magic number index 2 is used to access the target node unit. Consider using a named constant or enum to make the code more self-documenting and less error-prone.
return node_units_[2];
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.cc:187
- [nitpick] Magic number index 4 is used to access the target node unit. Consider using a named constant or enum to make the code more self-documenting and less error-prone.
return node_units_[4];
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqmatmul_fusion.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.cc
Outdated
Show resolved
Hide resolved
minfhong-qti
left a comment
There was a problem hiding this comment.
Should it be
const NodeUnitIODef& per_channel_float_def = scale_dql_node_unit.Inputs()[1]instead?
|
bd28495 to
7cf9b91
Compare
Nope; This is correct. The DQL node_unit contains only one input which is input[0] and other inputs like input[1], input[2] are wrapped as quantization params of input[0]. Here we first get the input[0] IODef, and then get the per_channel_float values from its quantization param: scale. |
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.h
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqmatmul_fusion.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqmatmul_fusion.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqmatmul_fusion.cc
Outdated
Show resolved
Hide resolved
- Case-2 LPBQ pattern omits QuantizeLinear node in LPBQ packing pattern - Modify LPBQ fusion logic in QNN EP implemented for Gemma and MatMul nodes to gracefully handle the optional QuantizeLinear node in LPBQ packing pattern. - Add unit tests to verify Case-2 LPBQ pattern fusion for Gemm and MatMul nodes.
- Fix review comments - Rebase the PR on tip and address the conflicts
7cf9b91 to
b1041ce
Compare
|
Hi @edgchen1 Could you please help to review and approve the latest PR version? Also, please help trigger CI job and merge after the successful review. |
onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.h
Outdated
Show resolved
Hide resolved
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
tirupath-qti
left a comment
There was a problem hiding this comment.
Addressed the review comments in a new commit added to this PR.
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
…icrosoft#25865) ### Description - Case-2 LPBQ pattern omits QuantizeLinear node in LPBQ packing pattern - Modify LPBQ fusion logic in QNN EP implemented for Gemma and MatMul nodes to gracefully handle the optional QuantizeLinear node in LPBQ packing pattern. - Add unit tests to verify Case-2 LPBQ pattern fusion for Gemm and MatMul nodes. ### Motivation and Context - QuantizeLinear node in LowPowerBlockQuantization encoding packing pattern can be optional as it helps to keep the weights in INT datatype and further helps to reduce the size of model. --------- Co-authored-by: tirupath-qti <tirupath@qti.qualcomm.com>
Description
Motivation and Context