Skip to content

feat(qualcomm): Qnn AOT Lowering passes#596

Merged
chenghuaWang merged 3 commits intoUbiquitousLearning:mainfrom
oreomaker:qnn-aot
Jan 9, 2026
Merged

feat(qualcomm): Qnn AOT Lowering passes#596
chenghuaWang merged 3 commits intoUbiquitousLearning:mainfrom
oreomaker:qnn-aot

Conversation

@oreomaker
Copy link
Copy Markdown
Collaborator

@oreomaker oreomaker commented Jan 9, 2026

Please check Guidelines for Contributing.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added QNN acceleration support for negation, sigmoid, and slice operations.
    • Extended reduce operation support (max, min, mean, sum) in QNN backend.
  • Performance

    • Optimized quantization block scale configuration for improved efficiency.
    • Improved weight tensor handling in neural network operations.

✏️ Tip: You can customize this high-level summary in your review settings.

- Add visitor implementations for Concat, Slice, Transpose, Reduce,
  Equal, Sigmoid, Matmul, Repeat, Softmax, and Where operations
- Implement QnnAOTNegPattern for negation operation support
- Add comprehensive reduce operation support including ReduceMax,
  ReduceMin, ReduceMean, and ReduceSum patterns
- Update LLM2QnnLoweringPass to register new operation patterns
- Include necessary header files for the new visitor implementations
…sor conversion

- Set blockwise_expansion.blockScaleBitwidth from 12 to 4 bits
- Remove unnecessary tensor conversion to UInt8
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

This PR extends QNN AOT (Ahead-of-Time) compilation support by introducing new operation patterns for Neg, Sigmoid, Slice, and Reduce variants, updating the lowering pass to register these patterns, adjusting blockwise quantization parameters, and removing unnecessary type casting in the Linear visitor.

Changes

Cohort / File(s) Summary
Quantization Configuration
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
Changed blockwise quantization blockScaleBitwidth from 12 bits to 4 bits for per-channel LPBQ block expansion.
Lowering Pass Registration
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
Added QNN AOT visitor headers (Concat, Slice, Transpose, Reduce, Equal, Sigmoid, Matmul, Repeat, Softmax, Where) and expanded registered lowering pattern set with Mul, Neg, Transpose, Slice, Concat, Repeat, MatMul, Reduce variants, Equal, Where, Softmax, and Sigmoid.
Element-wise Operations
mllm/backends/qnn/aot/visitor/Elewise.cpp, Elewise.hpp
Introduced QnnAOTNegPattern class with isMatch and rewrite implementations for QNN-accelerated negation, validating quant_recipe and graph/context attributes before constructing ElementWiseNeg nodes.
Linear Operations
mllm/backends/qnn/aot/visitor/Linear.cpp
Removed explicit uint8 casting of weight tensor prior to FullyConnected QNN op creation.
Reduce Operations
mllm/backends/qnn/aot/visitor/Reduce.cpp, Reduce.hpp
Added generic templated rewriteReduceOp helper and four concrete QNN AOT reduce patterns (ReduceMax, ReduceMin, ReduceMean, ReduceSum) with axes and keep_dims parameter handling, including negative dimension wrapping and parameter tensor construction.
Sigmoid Operations
mllm/backends/qnn/aot/visitor/Sigmoid.cpp, Sigmoid.hpp
Introduced QnnAOTSigmoidPattern class with isMatch and rewrite to detect Sigmoid ops with using_qnn attribute and construct QNN Sigmoid nodes with input/output wiring and graph registration.
Slice Operations
mllm/backends/qnn/aot/visitor/Slice.cpp, Slice.hpp
Added QnnAOTSlicePattern with isMatch and rewrite implementations that translate linalg.SliceOp to QNN StridedSlice nodes, computing begin/end masks from slice indices and constructing ranges tensor for rank-specific slice parameters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • liang1232018
  • chenghuaWang
  • yirongjie

Poem

🐰 A rabbit hopped through patterns bright,
Adding Neg and Slice in flight,
Sigmoid, Reduce—each new delight,
QNN compiles with quantum might! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description only contains the standard template with no custom narrative explaining the changes, objectives, or rationale. Replace the template placeholder with a detailed description of the changes, including which operations were added, why these changes were made, and any testing performed.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(qualcomm): Qnn AOT Lowering passes' accurately describes the primary change—adding QNN AOT lowering support for multiple operations, which is reflected in all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @mllm/backends/qnn/aot/visitor/Reduce.cpp:
- Around line 56-58: axesParam->alloc() may return nullptr, so avoid directly
dereferencing axesData; check the return of axesParam->alloc() into axesData and
handle the null case before writing axesData[0] (e.g., log/error/return or
throw), then only set axesData[0] and call
qnn_op_node->emplaceParamTensor(axesParam) when axesData is non-null. Ensure you
reference the alloc() call on axesParam, the axesData pointer, and the
subsequent qnn_op_node->emplaceParamTensor call when implementing the null check
and error handling.

In @mllm/backends/qnn/aot/visitor/Slice.cpp:
- Around line 95-96: The allocation for ranges_param (via ranges_param->alloc())
is not checked beyond skipping memcpy, so if ptr is nullptr you must abort
adding the parameter and registering the node to avoid using an uninitialized
ranges tensor; update the code around ptr (the result of ranges_param->alloc())
and the subsequent std::memcpy/ranges_data usage to explicitly handle allocation
failure (e.g., log an error/return a failure status or throw) and ensure you do
NOT call emplace_param or register_node when ptr is null.
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/visitor/Sigmoid.cpp (1)

17-53: Verify the need for "quant_recipe" attribute validation.

The rewrite implementation follows the standard pattern, but unlike elementwise operations (Add, Mul, Neg) which validate the presence of a "quant_recipe" attribute early in their rewrite methods, this Sigmoid implementation omits that check. The Slice pattern also omits it, suggesting this may be intentional for certain operation types.

Please verify whether Sigmoid operations require quantization recipes and if so, add the check:

MLLM_RETURN_FALSE_IF_NOT(op->getAttr("quant_recipe"));

Based on learnings: elementwise ops typically validate "quant_recipe" in their rewrite methods.

Optional: Consider shortening the inline comment.

Lines 39-43 contain a lengthy reasoning comment about package names. While context is valuable, consider condensing it or moving detailed rationale to documentation.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68c7c6c and cbdf5d7.

📒 Files selected for processing (11)
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/aot/visitor/Reduce.cpp
  • mllm/backends/qnn/aot/visitor/Reduce.hpp
  • mllm/backends/qnn/aot/visitor/Sigmoid.cpp
  • mllm/backends/qnn/aot/visitor/Sigmoid.hpp
  • mllm/backends/qnn/aot/visitor/Slice.cpp
  • mllm/backends/qnn/aot/visitor/Slice.hpp
💤 Files with no reviewable changes (1)
  • mllm/backends/qnn/aot/visitor/Linear.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/visitor/Sigmoid.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/visitor/Reduce.hpp
  • mllm/backends/qnn/aot/visitor/Slice.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Sigmoid.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/visitor/Reduce.cpp
  • mllm/backends/qnn/aot/visitor/Slice.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/visitor/Sigmoid.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/visitor/Reduce.hpp
  • mllm/backends/qnn/aot/visitor/Slice.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Sigmoid.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/visitor/Reduce.cpp
  • mllm/backends/qnn/aot/visitor/Slice.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/visitor/Sigmoid.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/visitor/Reduce.cpp
  • mllm/backends/qnn/aot/visitor/Slice.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/visitor/Sigmoid.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/visitor/Reduce.hpp
  • mllm/backends/qnn/aot/visitor/Slice.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Sigmoid.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/visitor/Reduce.cpp
  • mllm/backends/qnn/aot/visitor/Slice.cpp
🧬 Code graph analysis (4)
mllm/backends/qnn/aot/visitor/Sigmoid.cpp (3)
mllm/backends/qnn/aot/visitor/Elewise.cpp (12)
  • isMatch (13-15)
  • isMatch (13-13)
  • isMatch (48-50)
  • isMatch (48-48)
  • isMatch (83-85)
  • isMatch (83-83)
  • rewrite (17-46)
  • rewrite (17-17)
  • rewrite (52-81)
  • rewrite (52-52)
  • rewrite (87-114)
  • rewrite (87-87)
mllm/backends/qnn/aot/visitor/Slice.cpp (4)
  • isMatch (18-20)
  • isMatch (18-18)
  • rewrite (22-103)
  • rewrite (22-22)
mllm/backends/qnn/aot/visitor/Sigmoid.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Reduce.hpp (9)
mllm/backends/qnn/aot/visitor/Elewise.hpp (6)
  • op (14-14)
  • op (25-25)
  • op (36-36)
  • writer (16-16)
  • writer (27-27)
  • writer (38-38)
mllm/backends/qnn/aot/visitor/Sigmoid.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Slice.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Concat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Equal.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Repeat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Transpose.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Where.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Index.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Reduce.cpp (6)
mllm/backends/qnn/aot/visitor/Elewise.cpp (12)
  • isMatch (13-15)
  • isMatch (13-13)
  • isMatch (48-50)
  • isMatch (48-48)
  • isMatch (83-85)
  • isMatch (83-83)
  • rewrite (17-46)
  • rewrite (17-17)
  • rewrite (52-81)
  • rewrite (52-52)
  • rewrite (87-114)
  • rewrite (87-87)
mllm/backends/qnn/aot/visitor/Sigmoid.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-53)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/Slice.cpp (4)
  • isMatch (18-20)
  • isMatch (18-18)
  • rewrite (22-103)
  • rewrite (22-22)
mllm/backends/qnn/aot/visitor/Concat.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-71)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Equal.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-52)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Repeat.cpp (4)
  • isMatch (15-17)
  • isMatch (15-15)
  • rewrite (19-84)
  • rewrite (19-19)
mllm/backends/qnn/aot/visitor/Slice.cpp (2)
mllm/backends/qnn/aot/visitor/Sigmoid.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Slice.hpp (2)
  • op (14-14)
  • writer (16-16)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/visitor/Elewise.hpp

[error] 36-36: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 38-38: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Reduce.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 15-15: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 23-23: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 24-24: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 32-32: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 33-33: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 41-41: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 42-42: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Slice.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Sigmoid.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Reduce.cpp

[error] 16-16: 2 adjacent parameters of 'rewriteReduceOp' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 39-39: variable 'dim' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 40-40: variable 'keep_dims' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 54-54: invalid case style for variable 'axesName'

(readability-identifier-naming,-warnings-as-errors)


[error] 55-55: invalid case style for variable 'axesParam'

(readability-identifier-naming,-warnings-as-errors)


[error] 56-56: variable 'axesData' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 56-56: invalid case style for variable 'axesData'

(readability-identifier-naming,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-x86
  • GitHub Check: build-android
  • GitHub Check: build-macos
🔇 Additional comments (16)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

178-178: No issues identified with blockScaleBitwidth change.

The change from 12 to 4 bits for blockScaleBitwidth in the LPBQ (Low Precision Blockwise Quantization) configuration is correct. This value represents the per-block scale factor encoding bitwidth and is appropriate for the 4-bit weight quantization scheme (w4a16o16) used throughout the codebase. The 8-bit storage type supports efficient packing of 4-bit scale values. The adjustment aligns with QNN API specifications for blockwise quantization.

mllm/backends/qnn/aot/visitor/Slice.hpp (1)

1-23: LGTM!

The header follows the established pattern structure used by other AOT visitors (Sigmoid, Equal, Concat, etc.) with isMatch, rewrite, and the static create() factory method. The op parameter naming is consistent with the codebase convention.

mllm/backends/qnn/aot/visitor/Slice.cpp (1)

55-76: LGTM for mask calculation logic.

The mask calculation correctly handles the kAll sentinel values and the special case for negative indices (line 71). The bit-mask construction for begin_mask and end_mask follows QNN StridedSlice parameter conventions.

mllm/backends/qnn/aot/visitor/Elewise.cpp (1)

83-114: LGTM!

The QnnAOTNegPattern implementation correctly follows the established pattern for elementwise operations in this file (Add, Mul). It properly handles the single input/output semantics for unary negation and maintains consistency with the existing code structure.

mllm/backends/qnn/aot/visitor/Elewise.hpp (1)

34-43: LGTM!

The QnnAOTNegPattern declaration is consistent with the existing QnnAOTAddPattern and QnnAOTMulPattern classes in the same file.

mllm/backends/qnn/aot/visitor/Reduce.cpp (2)

15-65: Good use of templating to reduce duplication.

The rewriteReduceOp template function effectively consolidates common logic for all reduce operations, properly handling negative dimension wrapping (line 43) and the keep_dims parameter.


67-97: LGTM for pattern implementations.

Each reduce pattern correctly implements isMatch with the appropriate IR op type check and delegates to rewriteReduceOp with the corresponding template arguments.

mllm/backends/qnn/aot/visitor/Reduce.hpp (1)

1-48: LGTM!

The four reduce pattern class declarations follow the established structure and correctly map to their respective OpTypes in the factory methods. The pattern is consistent with other AOT visitor headers.

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (2)

23-32: LGTM for new includes.

The new includes correctly correspond to the patterns being added to the lowering pass.


36-42: LGTM for pattern registration.

The registerPatterns call now includes all the new patterns (Neg, Transpose, Slice, Concat, Repeat, MatMul, Reduce variants, Equal, Where, Softmax, Sigmoid) alongside the existing ones.

mllm/backends/qnn/aot/visitor/Sigmoid.cpp (3)

1-10: LGTM!

The copyright header and includes are appropriate for this implementation file.


13-15: LGTM!

The isMatch implementation correctly identifies Sigmoid operations marked for QNN lowering, consistent with other AOT patterns in the codebase.


55-55: LGTM!

Namespace closing is properly commented.

mllm/backends/qnn/aot/visitor/Sigmoid.hpp (3)

1-9: LGTM!

The copyright header, include guard, and necessary includes are all properly structured.


12-21: LGTM!

The class declaration follows the established pattern for AOT operation patterns in this codebase. The static factory method correctly returns a pair containing the operation type and shared pointer.

Note on static analysis warnings: The clang-tidy warnings about the parameter name op being too short are false positives. The codebase consistently uses op as the parameter name across all AOT pattern implementations (Add, Mul, Neg, Slice, etc.), making this an intentional convention.


23-23: LGTM!

Namespace closing is properly formatted and commented.

Comment on lines +56 to +58
uint32_t* axesData = static_cast<uint32_t*>(axesParam->alloc());
axesData[0] = static_cast<uint32_t>(dim);
qnn_op_node->emplaceParamTensor(axesParam);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Null pointer dereference risk.

axesParam->alloc() may return nullptr, but the code directly dereferences axesData[0] without checking. This could cause a crash.

🛠️ Proposed fix
   auto axesParam = mllm::qnn::QNNParamTensorWrapper::create("axes", axesName, QNN_DATATYPE_UINT_32, std::vector<uint32_t>{1});
   uint32_t* axesData = static_cast<uint32_t*>(axesParam->alloc());
+  if (!axesData) {
+    MLLM_ERROR("Failed to allocate memory for axes parameter");
+    return false;
+  }
   axesData[0] = static_cast<uint32_t>(dim);
   qnn_op_node->emplaceParamTensor(axesParam);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uint32_t* axesData = static_cast<uint32_t*>(axesParam->alloc());
axesData[0] = static_cast<uint32_t>(dim);
qnn_op_node->emplaceParamTensor(axesParam);
uint32_t* axesData = static_cast<uint32_t*>(axesParam->alloc());
if (!axesData) {
MLLM_ERROR("Failed to allocate memory for axes parameter");
return false;
}
axesData[0] = static_cast<uint32_t>(dim);
qnn_op_node->emplaceParamTensor(axesParam);
🧰 Tools
🪛 Clang (14.0.6)

[error] 56-56: variable 'axesData' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 56-56: invalid case style for variable 'axesData'

(readability-identifier-naming,-warnings-as-errors)

🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/visitor/Reduce.cpp around lines 56 - 58,
axesParam->alloc() may return nullptr, so avoid directly dereferencing axesData;
check the return of axesParam->alloc() into axesData and handle the null case
before writing axesData[0] (e.g., log/error/return or throw), then only set
axesData[0] and call qnn_op_node->emplaceParamTensor(axesParam) when axesData is
non-null. Ensure you reference the alloc() call on axesParam, the axesData
pointer, and the subsequent qnn_op_node->emplaceParamTensor call when
implementing the null check and error handling.

Comment on lines +95 to +96
int32_t* ptr = static_cast<int32_t*>(ranges_param->alloc());
if (ptr) { std::memcpy(ptr, ranges_data.data(), ranges_data.size() * sizeof(int32_t)); }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle allocation failure explicitly.

If ranges_param->alloc() returns nullptr, the code skips the memcpy but continues to emplace the parameter and register the node. This could lead to an uninitialized ranges tensor being used, causing undefined behavior at QNN runtime.

🛠️ Proposed fix
   int32_t* ptr = static_cast<int32_t*>(ranges_param->alloc());
-  if (ptr) { std::memcpy(ptr, ranges_data.data(), ranges_data.size() * sizeof(int32_t)); }
+  if (!ptr) {
+    MLLM_ERROR("Failed to allocate memory for ranges parameter");
+    return false;
+  }
+  std::memcpy(ptr, ranges_data.data(), ranges_data.size() * sizeof(int32_t));
   qnn_op_node->emplaceParamTensor(ranges_param);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
int32_t* ptr = static_cast<int32_t*>(ranges_param->alloc());
if (ptr) { std::memcpy(ptr, ranges_data.data(), ranges_data.size() * sizeof(int32_t)); }
int32_t* ptr = static_cast<int32_t*>(ranges_param->alloc());
if (!ptr) {
MLLM_ERROR("Failed to allocate memory for ranges parameter");
return false;
}
std::memcpy(ptr, ranges_data.data(), ranges_data.size() * sizeof(int32_t));
qnn_op_node->emplaceParamTensor(ranges_param);
🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/visitor/Slice.cpp around lines 95 - 96, The allocation
for ranges_param (via ranges_param->alloc()) is not checked beyond skipping
memcpy, so if ptr is nullptr you must abort adding the parameter and registering
the node to avoid using an uninitialized ranges tensor; update the code around
ptr (the result of ranges_param->alloc()) and the subsequent
std::memcpy/ranges_data usage to explicitly handle allocation failure (e.g., log
an error/return a failure status or throw) and ensure you do NOT call
emplace_param or register_node when ptr is null.

@chenghuaWang chenghuaWang changed the title Qnn aot feat(qualcomm): Qnn AOT Lowering passes Jan 9, 2026
Copy link
Copy Markdown
Collaborator

@chenghuaWang chenghuaWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenghuaWang chenghuaWang merged commit c91c54a into UbiquitousLearning:main Jan 9, 2026
4 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants