feat(qualcomm): Qnn AOT Lowering passes by oreomaker · Pull Request #596 · UbiquitousLearning/mllm

oreomaker · 2026-01-09T13:21:50Z

Please check Guidelines for Contributing.

Summary by CodeRabbit

Release Notes

New Features
- Added QNN acceleration support for negation, sigmoid, and slice operations.
- Extended reduce operation support (max, min, mean, sum) in QNN backend.
Performance
- Optimized quantization block scale configuration for improved efficiency.
- Improved weight tensor handling in neural network operations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Add visitor implementations for Concat, Slice, Transpose, Reduce, Equal, Sigmoid, Matmul, Repeat, Softmax, and Where operations - Implement QnnAOTNegPattern for negation operation support - Add comprehensive reduce operation support including ReduceMax, ReduceMin, ReduceMean, and ReduceSum patterns - Update LLM2QnnLoweringPass to register new operation patterns - Include necessary header files for the new visitor implementations

…sor conversion - Set blockwise_expansion.blockScaleBitwidth from 12 to 4 bits - Remove unnecessary tensor conversion to UInt8

coderabbitai · 2026-01-09T13:22:01Z

📝 Walkthrough

Walkthrough

This PR extends QNN AOT (Ahead-of-Time) compilation support by introducing new operation patterns for Neg, Sigmoid, Slice, and Reduce variants, updating the lowering pass to register these patterns, adjusting blockwise quantization parameters, and removing unnecessary type casting in the Linear visitor.

Changes

Cohort / File(s)	Summary
Quantization Configuration `mllm/backends/qnn/aot/QnnWrappersAPI.cpp`	Changed blockwise quantization blockScaleBitwidth from 12 bits to 4 bits for per-channel LPBQ block expansion.
Lowering Pass Registration `mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`	Added QNN AOT visitor headers (Concat, Slice, Transpose, Reduce, Equal, Sigmoid, Matmul, Repeat, Softmax, Where) and expanded registered lowering pattern set with Mul, Neg, Transpose, Slice, Concat, Repeat, MatMul, Reduce variants, Equal, Where, Softmax, and Sigmoid.
Element-wise Operations `mllm/backends/qnn/aot/visitor/Elewise.cpp`, `Elewise.hpp`	Introduced QnnAOTNegPattern class with isMatch and rewrite implementations for QNN-accelerated negation, validating quant_recipe and graph/context attributes before constructing ElementWiseNeg nodes.
Linear Operations `mllm/backends/qnn/aot/visitor/Linear.cpp`	Removed explicit uint8 casting of weight tensor prior to FullyConnected QNN op creation.
Reduce Operations `mllm/backends/qnn/aot/visitor/Reduce.cpp`, `Reduce.hpp`	Added generic templated rewriteReduceOp helper and four concrete QNN AOT reduce patterns (ReduceMax, ReduceMin, ReduceMean, ReduceSum) with axes and keep_dims parameter handling, including negative dimension wrapping and parameter tensor construction.
Sigmoid Operations `mllm/backends/qnn/aot/visitor/Sigmoid.cpp`, `Sigmoid.hpp`	Introduced QnnAOTSigmoidPattern class with isMatch and rewrite to detect Sigmoid ops with using_qnn attribute and construct QNN Sigmoid nodes with input/output wiring and graph registration.
Slice Operations `mllm/backends/qnn/aot/visitor/Slice.cpp`, `Slice.hpp`	Added QnnAOTSlicePattern with isMatch and rewrite implementations that translate linalg.SliceOp to QNN StridedSlice nodes, computing begin/end masks from slice indices and constructing ranges tensor for rank-specific slice parameters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(qualcomm): Qnn AOT Lowering Passes #584 — Directly overlaps with LPBQ and per-channel quantization bitwidth edits in QnnWrappersAPI.cpp
feat(qualcomm): AOTPipeline update #585 — Both PRs modify the QNN AOT Linear visitor and QNN AOT pipeline components
fix: Qualcomm QNN AOT Pass #579 — Both PRs extend QNN AOT backend with visitor patterns (Sigmoid, Neg, Slice, Reduce) in the same implementation files

Suggested reviewers

liang1232018
chenghuaWang
yirongjie

Poem

🐰 A rabbit hopped through patterns bright,
Adding Neg and Slice in flight,
Sigmoid, Reduce—each new delight,
QNN compiles with quantum might! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description only contains the standard template with no custom narrative explaining the changes, objectives, or rationale.	Replace the template placeholder with a detailed description of the changes, including which operations were added, why these changes were made, and any testing performed.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(qualcomm): Qnn AOT Lowering passes' accurately describes the primary change—adding QNN AOT lowering support for multiple operations, which is reflected in all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @mllm/backends/qnn/aot/visitor/Reduce.cpp:
- Around line 56-58: axesParam->alloc() may return nullptr, so avoid directly
dereferencing axesData; check the return of axesParam->alloc() into axesData and
handle the null case before writing axesData[0] (e.g., log/error/return or
throw), then only set axesData[0] and call
qnn_op_node->emplaceParamTensor(axesParam) when axesData is non-null. Ensure you
reference the alloc() call on axesParam, the axesData pointer, and the
subsequent qnn_op_node->emplaceParamTensor call when implementing the null check
and error handling.

In @mllm/backends/qnn/aot/visitor/Slice.cpp:
- Around line 95-96: The allocation for ranges_param (via ranges_param->alloc())
is not checked beyond skipping memcpy, so if ptr is nullptr you must abort
adding the parameter and registering the node to avoid using an uninitialized
ranges tensor; update the code around ptr (the result of ranges_param->alloc())
and the subsequent std::memcpy/ranges_data usage to explicitly handle allocation
failure (e.g., log an error/return a failure status or throw) and ensure you do
NOT call emplace_param or register_node when ptr is null.

🧹 Nitpick comments (1)

mllm/backends/qnn/aot/visitor/Sigmoid.cpp (1)
17-53: Verify the need for "quant_recipe" attribute validation.

The rewrite implementation follows the standard pattern, but unlike elementwise operations (Add, Mul, Neg) which validate the presence of a "quant_recipe" attribute early in their rewrite methods, this Sigmoid implementation omits that check. The Slice pattern also omits it, suggesting this may be intentional for certain operation types.

Please verify whether Sigmoid operations require quantization recipes and if so, add the check:
MLLM_RETURN_FALSE_IF_NOT(op->getAttr("quant_recipe"));
Based on learnings: elementwise ops typically validate "quant_recipe" in their rewrite methods.

Optional: Consider shortening the inline comment.

Lines 39-43 contain a lengthy reasoning comment about package names. While context is valuable, consider condensing it or moving detailed rationale to documentation.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68c7c6c and cbdf5d7.

📒 Files selected for processing (11)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/aot/visitor/Reduce.cpp
mllm/backends/qnn/aot/visitor/Reduce.hpp
mllm/backends/qnn/aot/visitor/Sigmoid.cpp
mllm/backends/qnn/aot/visitor/Sigmoid.hpp
mllm/backends/qnn/aot/visitor/Slice.cpp
mllm/backends/qnn/aot/visitor/Slice.hpp

💤 Files with no reviewable changes (1)

mllm/backends/qnn/aot/visitor/Linear.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/visitor/Sigmoid.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/visitor/Reduce.hpp
mllm/backends/qnn/aot/visitor/Slice.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Sigmoid.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/visitor/Reduce.cpp
mllm/backends/qnn/aot/visitor/Slice.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/visitor/Sigmoid.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/visitor/Reduce.hpp
mllm/backends/qnn/aot/visitor/Slice.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Sigmoid.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/visitor/Reduce.cpp
mllm/backends/qnn/aot/visitor/Slice.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/visitor/Sigmoid.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/visitor/Reduce.cpp
mllm/backends/qnn/aot/visitor/Slice.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/visitor/Sigmoid.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/visitor/Reduce.hpp
mllm/backends/qnn/aot/visitor/Slice.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Sigmoid.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/visitor/Reduce.cpp
mllm/backends/qnn/aot/visitor/Slice.cpp

🧬 Code graph analysis (4)

mllm/backends/qnn/aot/visitor/Sigmoid.cpp (3)

mllm/backends/qnn/aot/visitor/Elewise.cpp (12)

isMatch (13-15)

isMatch (13-13)

isMatch (48-50)

isMatch (48-48)

isMatch (83-85)

isMatch (83-83)

rewrite (17-46)

rewrite (17-17)

rewrite (52-81)

rewrite (52-52)

rewrite (87-114)

rewrite (87-87)

mllm/backends/qnn/aot/visitor/Slice.cpp (4)

isMatch (18-20)

isMatch (18-18)

rewrite (22-103)

rewrite (22-22)

mllm/backends/qnn/aot/visitor/Sigmoid.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Reduce.hpp (9)

mllm/backends/qnn/aot/visitor/Elewise.hpp (6)

op (14-14)

op (25-25)

op (36-36)

writer (16-16)

writer (27-27)

writer (38-38)

mllm/backends/qnn/aot/visitor/Sigmoid.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Slice.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Concat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Equal.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Repeat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Transpose.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Where.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Index.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Reduce.cpp (6)

mllm/backends/qnn/aot/visitor/Elewise.cpp (12)

isMatch (13-15)

isMatch (13-13)

isMatch (48-50)

isMatch (48-48)

isMatch (83-85)

isMatch (83-83)

rewrite (17-46)

rewrite (17-17)

rewrite (52-81)

rewrite (52-52)

rewrite (87-114)

rewrite (87-87)

mllm/backends/qnn/aot/visitor/Sigmoid.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-53)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/Slice.cpp (4)

isMatch (18-20)

isMatch (18-18)

rewrite (22-103)

rewrite (22-22)

mllm/backends/qnn/aot/visitor/Concat.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-71)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Equal.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-52)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Repeat.cpp (4)

isMatch (15-17)

isMatch (15-15)

rewrite (19-84)

rewrite (19-19)

mllm/backends/qnn/aot/visitor/Slice.cpp (2)

mllm/backends/qnn/aot/visitor/Sigmoid.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Slice.hpp (2)

op (14-14)

writer (16-16)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/visitor/Elewise.hpp

[error] 36-36: parameter name 'op' is too short, expected at least 3 characters