Skip to content

feat(qualcomm): AOTPipeline update#585

Merged
chenghuaWang merged 1 commit intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main
Jan 7, 2026
Merged

feat(qualcomm): AOTPipeline update#585
chenghuaWang merged 1 commit intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main

Conversation

@chenghuaWang
Copy link
Copy Markdown
Collaborator

@chenghuaWang chenghuaWang commented Jan 7, 2026

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Enhanced validation error handling with immediate failure termination in backend operations.
  • Performance Improvements

    • Enabled advanced model compilation optimization pass for improved execution performance.
    • Optimized linear operation processing by eliminating unnecessary tensor transformations.
  • Updates

    • Adjusted logging format display for improved readability.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

This PR modifies the QNN backend with improvements to error handling, logging formatting, IR attribute propagation, and optimization configuration. Changes span validation logic in model building, log output formatting, AOT pipeline pass enablement, Linalg operation attribute marking, and linear layer weight tensor handling.

Changes

Cohort / File(s) Summary
Error Handling & Validation
mllm/backends/qnn/QNNModel.cpp
Duplicate tensor check now logs at INFO level instead of ERROR. Validation failures during node addition now trigger MLLM_ERROR_EXIT with kCoreError instead of continuing to resource cleanup and returning MODEL_GRAPH_ERROR.
Logging Formatting
mllm/backends/qnn/QNNUtils.hpp
Updated QNN logger callback level labels to include surrounding spaces: [ ERROR ], [ WARN ], [ INFO ], [ DEBUG ] for improved log output readability.
AOT Pipeline Configuration
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
Enabled the LLM-to-QNN lowering pass in the QNN AOT pipeline by uncommenting createLLM2QnnLoweringPass() invocation when quant_recipe.llm_recipe is true.
IR Attribute Propagation
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
Added traversal over LinalgIROp nodes within recursiveVisitGraph to propagate the using_qnn attribute to Linalg operations via writer.walkir::linalg::LinalgIROp.
Linear Layer Optimization
mllm/backends/qnn/aot/visitor/Linear.cpp
Removed runtime weight tensor reshaping before FullyConnected op. Changed keep_dims parameter from true to false. Reordered include directives.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • oreomaker
  • liang1232018

Poem

🐰 Hop! Hop! The pipeline springs to life,
With QNN passes cutting through the strife,
Linalg attributes bloom so bright,
While linear ops leap left and right,
Error logs now exit with might!

🚥 Pre-merge checks | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is entirely missing; no description content was provided by the author despite the repository having a description template with contribution guidelines. Add a comprehensive description explaining the purpose of each change, the rationale behind the modifications, and any testing performed. Reference the repository's contribution guidelines in the description.
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'feat(qualcomm): AOTPipeline update' partially relates to the changeset but is vague; it describes only one file change (AOTPipeline.cpp) while the PR modifies multiple files with distinct purposes (QNNModel error handling, logging format, attribute propagation, and Linear operation behavior). Revise the title to either describe the primary change more specifically (e.g., 'feat(qualcomm): Enable LLM-to-QNN lowering in AOT pipeline') or use a broader umbrella term reflecting all changes ('feat(qualcomm): QNN backend improvements and fixes').

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Owner

@UbiquitousLearning UbiquitousLearning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @mllm/backends/qnn/QNNModel.cpp:
- Around line 164-165: The log in QNNModel::addTensorWrapper reports a duplicate
tensor with MLLM_INFO while the function returns MODEL_TENSOR_ERROR, creating
inconsistent error handling; change the log call to MLLM_ERROR (or another
error-level macro used in the codebase) so the message matches the error return
MODEL_TENSOR_ERROR and still includes tensorName, ensuring the log level
reflects the actual failure path.
- Around line 293-297: The code currently calls MLLM_ERROR_EXIT in
QNNModel::addNode when validation fails (validationStatus check), making the
subsequent freeMultiPtr(nodeParams, inputs, outputs) and return
MODEL_GRAPH_ERROR dead code; remove those unreachable lines and either revert to
the original non-fatal behavior (replace MLLM_ERROR_EXIT with MLLM_ERROR and
return MODEL_GRAPH_ERROR after calling freeMultiPtr) or, if process termination
is intentional, keep MLLM_ERROR_EXIT but delete freeMultiPtr and return
statements and add a clear comment documenting that validation failures are
fatal; ensure you modify the branch handling validationStatus and update any
related documentation/tests to reflect the chosen behavior.
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (1)

28-32: Consider using a more descriptive variable name.

The logic correctly propagates the using_qnn attribute to all LinalgIROp nodes. However, the variable name ooo is not descriptive. Consider renaming it to something more meaningful like linalg_op for better code readability.

♻️ Suggested improvement
-  writer.walk<ir::linalg::LinalgIROp>(
-      [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& ooo) -> ir::IRWriter::WalkResult {
-        if (!ooo->getAttr("using_qnn")) { ooo->setAttr("using_qnn", details::createTrueBoolAttr(ctx)); }
+  writer.walk<ir::linalg::LinalgIROp>(
+      [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult {
+        if (!linalg_op->getAttr("using_qnn")) { linalg_op->setAttr("using_qnn", details::createTrueBoolAttr(ctx)); }
         return ir::IRWriter::WalkResult::WALK_CONTINUE;
       });
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 545677e and 3c09a91.

📒 Files selected for processing (6)
  • README-ZH.md
  • mllm/backends/qnn/QNNModel.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/visitor/Linear.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/QNNModel.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/QNNUtils.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/QNNModel.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/QNNUtils.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/QNNModel.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/QNNModel.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/QNNUtils.hpp
🧠 Learnings (1)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Applied to files:

  • mllm/backends/qnn/QNNUtils.hpp
🧬 Code graph analysis (3)
mllm/backends/qnn/aot/visitor/Linear.cpp (2)
mllm/backends/qnn/QNNUtils.hpp (2)
  • create (256-258)
  • create (256-256)
pymllm/ffi/__init__.py (1)
  • create (579-582)
mllm/backends/qnn/QNNModel.cpp (1)
mllm/backends/qnn/QNNModel.hpp (4)
  • tensorName (51-52)
  • tensorName (55-56)
  • tensorName (59-59)
  • tensorName (86-86)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (6)
mllm/backends/qnn/aot/visitor/Linear.hpp (1)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Matmul.hpp (1)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/RMSNorm.hpp (1)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Softmax.hpp (1)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
  • writer (16-16)
  • writer (27-27)
mllm/compile/ir/linalg/Op.hpp (1)
  • ctx (157-157)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-macos
  • GitHub Check: build-x86
  • GitHub Check: build-android
🔇 Additional comments (4)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)

32-32: LGTM! Verify the pass has been tested in the pipeline.

Enabling the LLM2QnnLoweringPass completes the AOT lowering pipeline for LLM applications. Since this pass was previously disabled, ensure it has been thoroughly tested in this position within the pipeline (after MarkTensorIOPass).

mllm/backends/qnn/aot/visitor/Linear.cpp (2)

5-5: Header include order change.

The reordering of DataTypes.hpp is cosmetic and does not affect functionality.


73-73: The keep_dims parameter is correctly set to false for the QNN FullyConnected operation.

The current setting aligns with QNN HTP backend expectations. The Qwen NPU model documentation indicates FullyConnected operations are expected to produce 2D output in the format [B * S, H * D], which is consistent with keep_dims=false. The weight tensor handling (conversion to kUInt8 at line 54) is separate from output shape configuration and does not interact with the keep_dims parameter.

mllm/backends/qnn/QNNUtils.hpp (1)

100-112: LGTM! Log label formatting improvement.

The addition of spaces to log level labels improves visual alignment in console output. This is a purely cosmetic change with no functional impact.

Comment on lines +164 to 165
MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);
return MODEL_TENSOR_ERROR;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent error handling: INFO log with ERROR return.

The log level was changed from ERROR to INFO, but the function still returns MODEL_TENSOR_ERROR on Line 165. This creates an inconsistency:

  • If duplicate tensors are expected and recoverable, consider returning success or a specific non-error code
  • If this is truly an error condition (suggested by the error return), the log should remain at ERROR level
🔧 Suggested fix: Revert to ERROR logging
-    MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);
+    MLLM_ERROR("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);
     return MODEL_TENSOR_ERROR;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);
return MODEL_TENSOR_ERROR;
MLLM_ERROR("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);
return MODEL_TENSOR_ERROR;
🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNModel.cpp around lines 164 - 165, The log in
QNNModel::addTensorWrapper reports a duplicate tensor with MLLM_INFO while the
function returns MODEL_TENSOR_ERROR, creating inconsistent error handling;
change the log call to MLLM_ERROR (or another error-level macro used in the
codebase) so the message matches the error return MODEL_TENSOR_ERROR and still
includes tensorName, ensuring the log level reflects the actual failure path.

Comment on lines +293 to 297
MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validation API not supported.");
} else if (validationStatus != QNN_SUCCESS) {
MLLM_ERROR("QNNModel::addNode() validating node {} failed.", storedStrings.name);
MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validating node {} failed.", storedStrings.name);
freeMultiPtr(nodeParams, inputs, outputs);
return MODEL_GRAPH_ERROR;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dead code after ERROR_EXIT and changed error handling behavior.

Two issues:

  1. Dead code (Line 296-297): The MLLM_ERROR_EXIT on Line 295 terminates the program, making the subsequent freeMultiPtr and return MODEL_GRAPH_ERROR statements unreachable. Remove the dead code.

  2. Breaking change in error handling: Changing from MLLM_ERROR + return to MLLM_ERROR_EXIT prevents error recovery by calling code. This is a significant behavioral change that:

    • Terminates the entire process on validation failure
    • Prevents cleanup or graceful degradation
    • May surprise users expecting error codes rather than process termination

Verify that this aggressive error handling aligns with the intended behavior for validation failures.

🐛 Proposed fix: Remove dead code and document behavior change
     auto validationStatus = qnnInterface_.backendValidateOpConfig(backendHandle_, opDefinition);
     if (validationStatus == QNN_BACKEND_ERROR_NOT_SUPPORTED) {
       MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validation API not supported.");
     } else if (validationStatus != QNN_SUCCESS) {
       MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validating node {} failed.", storedStrings.name);
-      freeMultiPtr(nodeParams, inputs, outputs);
-      return MODEL_GRAPH_ERROR;
     }
🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNModel.cpp around lines 293 - 297, The code currently
calls MLLM_ERROR_EXIT in QNNModel::addNode when validation fails
(validationStatus check), making the subsequent freeMultiPtr(nodeParams, inputs,
outputs) and return MODEL_GRAPH_ERROR dead code; remove those unreachable lines
and either revert to the original non-fatal behavior (replace MLLM_ERROR_EXIT
with MLLM_ERROR and return MODEL_GRAPH_ERROR after calling freeMultiPtr) or, if
process termination is intentional, keep MLLM_ERROR_EXIT but delete freeMultiPtr
and return statements and add a clear comment documenting that validation
failures are fatal; ensure you modify the branch handling validationStatus and
update any related documentation/tests to reflect the chosen behavior.

@chenghuaWang chenghuaWang merged commit 43f2062 into UbiquitousLearning:main Jan 7, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants