feat(qualcomm): AOTPipeline update by chenghuaWang · Pull Request #585 · UbiquitousLearning/mllm

chenghuaWang · 2026-01-07T09:28:37Z

Summary by CodeRabbit

Release Notes

Bug Fixes
- Enhanced validation error handling with immediate failure termination in backend operations.
Performance Improvements
- Enabled advanced model compilation optimization pass for improved execution performance.
- Optimized linear operation processing by eliminating unnecessary tensor transformations.
Updates
- Adjusted logging format display for improved readability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-07T09:28:49Z

📝 Walkthrough

Walkthrough

This PR modifies the QNN backend with improvements to error handling, logging formatting, IR attribute propagation, and optimization configuration. Changes span validation logic in model building, log output formatting, AOT pipeline pass enablement, Linalg operation attribute marking, and linear layer weight tensor handling.

Changes

Cohort / File(s)	Summary
Error Handling & Validation `mllm/backends/qnn/QNNModel.cpp`	Duplicate tensor check now logs at INFO level instead of ERROR. Validation failures during node addition now trigger MLLM_ERROR_EXIT with kCoreError instead of continuing to resource cleanup and returning MODEL_GRAPH_ERROR.
Logging Formatting `mllm/backends/qnn/QNNUtils.hpp`	Updated QNN logger callback level labels to include surrounding spaces: `[ ERROR ]`, `[ WARN ]`, `[ INFO ]`, `[ DEBUG ]` for improved log output readability.
AOT Pipeline Configuration `mllm/backends/qnn/aot/passes/AOTPipeline.cpp`	Enabled the LLM-to-QNN lowering pass in the QNN AOT pipeline by uncommenting `createLLM2QnnLoweringPass()` invocation when quant_recipe.llm_recipe is true.
IR Attribute Propagation `mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp`	Added traversal over LinalgIROp nodes within recursiveVisitGraph to propagate the using_qnn attribute to Linalg operations via writer.walkir::linalg::LinalgIROp.
Linear Layer Optimization `mllm/backends/qnn/aot/visitor/Linear.cpp`	Removed runtime weight tensor reshaping before FullyConnected op. Changed keep_dims parameter from true to false. Reordered include directives.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: add LLM2QnnLoweringPass and update graph splitting logic #577: Directly related to the AOTPipeline.cpp change; both add/enable the LLM2QnnLoweringPass to the QNN AOT lowering pipeline.
feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline #569: Related modification to QNN AOT pipeline by adding lowering passes in the same compilation phase.

Suggested reviewers

oreomaker
liang1232018

Poem

🐰 Hop! Hop! The pipeline springs to life,
With QNN passes cutting through the strife,
Linalg attributes bloom so bright,
While linear ops leap left and right,
Error logs now exit with might!

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is entirely missing; no description content was provided by the author despite the repository having a description template with contribution guidelines.	Add a comprehensive description explaining the purpose of each change, the rationale behind the modifications, and any testing performed. Reference the repository's contribution guidelines in the description.
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'feat(qualcomm): AOTPipeline update' partially relates to the changeset but is vague; it describes only one file change (AOTPipeline.cpp) while the PR modifies multiple files with distinct purposes (QNNModel error handling, logging format, attribute propagation, and Linear operation behavior).	Revise the title to either describe the primary change more specifically (e.g., 'feat(qualcomm): Enable LLM-to-QNN lowering in AOT pipeline') or use a broader umbrella term reflecting all changes ('feat(qualcomm): QNN backend improvements and fixes').

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

UbiquitousLearning

LGTM

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @mllm/backends/qnn/QNNModel.cpp:
- Around line 164-165: The log in QNNModel::addTensorWrapper reports a duplicate
tensor with MLLM_INFO while the function returns MODEL_TENSOR_ERROR, creating
inconsistent error handling; change the log call to MLLM_ERROR (or another
error-level macro used in the codebase) so the message matches the error return
MODEL_TENSOR_ERROR and still includes tensorName, ensuring the log level
reflects the actual failure path.
- Around line 293-297: The code currently calls MLLM_ERROR_EXIT in
QNNModel::addNode when validation fails (validationStatus check), making the
subsequent freeMultiPtr(nodeParams, inputs, outputs) and return
MODEL_GRAPH_ERROR dead code; remove those unreachable lines and either revert to
the original non-fatal behavior (replace MLLM_ERROR_EXIT with MLLM_ERROR and
return MODEL_GRAPH_ERROR after calling freeMultiPtr) or, if process termination
is intentional, keep MLLM_ERROR_EXIT but delete freeMultiPtr and return
statements and add a clear comment documenting that validation failures are
fatal; ensure you modify the branch handling validationStatus and update any
related documentation/tests to reflect the chosen behavior.

🧹 Nitpick comments (1)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (1)
28-32: Consider using a more descriptive variable name.

The logic correctly propagates the using_qnn attribute to all LinalgIROp nodes. However, the variable name ooo is not descriptive. Consider renaming it to something more meaningful like linalg_op for better code readability.
♻️ Suggested improvement
-  writer.walk<ir::linalg::LinalgIROp>(
-      [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& ooo) -> ir::IRWriter::WalkResult {
-        if (!ooo->getAttr("using_qnn")) { ooo->setAttr("using_qnn", details::createTrueBoolAttr(ctx)); }
+  writer.walk<ir::linalg::LinalgIROp>(
+      [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult {
+        if (!linalg_op->getAttr("using_qnn")) { linalg_op->setAttr("using_qnn", details::createTrueBoolAttr(ctx)); }
         return ir::IRWriter::WalkResult::WALK_CONTINUE;
       });

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 545677e and 3c09a91.

📒 Files selected for processing (6)

README-ZH.md
mllm/backends/qnn/QNNModel.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/visitor/Linear.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/QNNModel.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/QNNUtils.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/QNNModel.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/QNNUtils.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/QNNModel.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/QNNModel.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/QNNUtils.hpp

🧠 Learnings (1)

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Applied to files:

mllm/backends/qnn/QNNUtils.hpp

🧬 Code graph analysis (3)

mllm/backends/qnn/aot/visitor/Linear.cpp (2)

mllm/backends/qnn/QNNUtils.hpp (2)

create (256-258)

create (256-256)

pymllm/ffi/__init__.py (1)

create (579-582)

mllm/backends/qnn/QNNModel.cpp (1)

mllm/backends/qnn/QNNModel.hpp (4)

tensorName (51-52)

tensorName (55-56)

tensorName (59-59)

tensorName (86-86)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (6)

mllm/backends/qnn/aot/visitor/Linear.hpp (1)

writer (16-16)

mllm/backends/qnn/aot/visitor/Matmul.hpp (1)

writer (16-16)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp (1)

writer (16-16)

mllm/backends/qnn/aot/visitor/Softmax.hpp (1)

writer (16-16)

mllm/backends/qnn/aot/visitor/Elewise.hpp (2)

writer (16-16)

writer (27-27)

mllm/compile/ir/linalg/Op.hpp (1)

ctx (157-157)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-macos
GitHub Check: build-x86
GitHub Check: build-android

🔇 Additional comments (4)

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)

32-32: LGTM! Verify the pass has been tested in the pipeline.

Enabling the LLM2QnnLoweringPass completes the AOT lowering pipeline for LLM applications. Since this pass was previously disabled, ensure it has been thoroughly tested in this position within the pipeline (after MarkTensorIOPass).

mllm/backends/qnn/aot/visitor/Linear.cpp (2)

5-5: Header include order change.

The reordering of DataTypes.hpp is cosmetic and does not affect functionality.

73-73: The keep_dims parameter is correctly set to false for the QNN FullyConnected operation.

The current setting aligns with QNN HTP backend expectations. The Qwen NPU model documentation indicates FullyConnected operations are expected to produce 2D output in the format [B * S, H * D], which is consistent with keep_dims=false. The weight tensor handling (conversion to kUInt8 at line 54) is separate from output shape configuration and does not interact with the keep_dims parameter.

mllm/backends/qnn/QNNUtils.hpp (1)

100-112: LGTM! Log label formatting improvement.

The addition of spaces to log level labels improves visual alignment in console output. This is a purely cosmetic change with no functional impact.

coderabbitai · 2026-01-07T09:32:09Z

+    MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);
    return MODEL_TENSOR_ERROR;


⚠️ Potential issue | 🟡 Minor

Inconsistent error handling: INFO log with ERROR return.

The log level was changed from ERROR to INFO, but the function still returns MODEL_TENSOR_ERROR on Line 165. This creates an inconsistency:

If duplicate tensors are expected and recoverable, consider returning success or a specific non-error code

If this is truly an error condition (suggested by the error return), the log should remain at ERROR level

🔧 Suggested fix: Revert to ERROR logging

- MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName); + MLLM_ERROR("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName); return MODEL_TENSOR_ERROR;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);

return MODEL_TENSOR_ERROR;

MLLM_ERROR("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);

return MODEL_TENSOR_ERROR;

🤖 Prompt for AI Agents

In @mllm/backends/qnn/QNNModel.cpp around lines 164 - 165, The log in QNNModel::addTensorWrapper reports a duplicate tensor with MLLM_INFO while the function returns MODEL_TENSOR_ERROR, creating inconsistent error handling; change the log call to MLLM_ERROR (or another error-level macro used in the codebase) so the message matches the error return MODEL_TENSOR_ERROR and still includes tensorName, ensuring the log level reflects the actual failure path.

coderabbitai · 2026-01-07T09:32:09Z

+      MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validation API not supported.");
    } else if (validationStatus != QNN_SUCCESS) {
-      MLLM_ERROR("QNNModel::addNode() validating node {} failed.", storedStrings.name);
+      MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validating node {} failed.", storedStrings.name);
      freeMultiPtr(nodeParams, inputs, outputs);
      return MODEL_GRAPH_ERROR;


⚠️ Potential issue | 🟠 Major

Dead code after ERROR_EXIT and changed error handling behavior.

Two issues:

Dead code (Line 296-297): The MLLM_ERROR_EXIT on Line 295 terminates the program, making the subsequent freeMultiPtr and return MODEL_GRAPH_ERROR statements unreachable. Remove the dead code.

Breaking change in error handling: Changing from MLLM_ERROR + return to MLLM_ERROR_EXIT prevents error recovery by calling code. This is a significant behavioral change that:

Terminates the entire process on validation failure

Prevents cleanup or graceful degradation

May surprise users expecting error codes rather than process termination

Verify that this aggressive error handling aligns with the intended behavior for validation failures.

🐛 Proposed fix: Remove dead code and document behavior change

auto validationStatus = qnnInterface_.backendValidateOpConfig(backendHandle_, opDefinition); if (validationStatus == QNN_BACKEND_ERROR_NOT_SUPPORTED) { MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validation API not supported."); } else if (validationStatus != QNN_SUCCESS) { MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validating node {} failed.", storedStrings.name); - freeMultiPtr(nodeParams, inputs, outputs); - return MODEL_GRAPH_ERROR; }

🤖 Prompt for AI Agents

In @mllm/backends/qnn/QNNModel.cpp around lines 293 - 297, The code currently calls MLLM_ERROR_EXIT in QNNModel::addNode when validation fails (validationStatus check), making the subsequent freeMultiPtr(nodeParams, inputs, outputs) and return MODEL_GRAPH_ERROR dead code; remove those unreachable lines and either revert to the original non-fatal behavior (replace MLLM_ERROR_EXIT with MLLM_ERROR and return MODEL_GRAPH_ERROR after calling freeMultiPtr) or, if process termination is intentional, keep MLLM_ERROR_EXIT but delete freeMultiPtr and return statements and add a clear comment documenting that validation failures are fatal; ensure you modify the branch handling validationStatus and update any related documentation/tests to reflect the chosen behavior.

feat(qualcomm): AOTPipeline update

3c09a91

chenghuaWang requested review from liang1232018 and oreomaker as code owners January 7, 2026 09:28

UbiquitousLearning approved these changes Jan 7, 2026

View reviewed changes

coderabbitai Bot reviewed Jan 7, 2026

View reviewed changes

chenghuaWang merged commit 43f2062 into UbiquitousLearning:main Jan 7, 2026
4 checks passed

This was referenced Jan 9, 2026

fix: LPBQ return shape fellow qnn spec #595

Merged

feat(qualcomm): Qnn AOT Lowering passes #596

Merged

coderabbitai Bot mentioned this pull request Jan 27, 2026

refactor(qwen3): Simplify input handling in Qwen3 model #614

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(qualcomm): AOTPipeline update#585

feat(qualcomm): AOTPipeline update#585
chenghuaWang merged 1 commit intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main

chenghuaWang commented Jan 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

UbiquitousLearning left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jan 7, 2026

Uh oh!

coderabbitai Bot Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);
		return MODEL_TENSOR_ERROR;

Conversation

chenghuaWang commented Jan 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

UbiquitousLearning left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghuaWang commented Jan 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 7, 2026 •

edited

Loading