feat(qualcomm): AOTPipeline update#585
Conversation
📝 WalkthroughWalkthroughThis PR modifies the QNN backend with improvements to error handling, logging formatting, IR attribute propagation, and optimization configuration. Changes span validation logic in model building, log output formatting, AOT pipeline pass enablement, Linalg operation attribute marking, and linear layer weight tensor handling. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @mllm/backends/qnn/QNNModel.cpp:
- Around line 164-165: The log in QNNModel::addTensorWrapper reports a duplicate
tensor with MLLM_INFO while the function returns MODEL_TENSOR_ERROR, creating
inconsistent error handling; change the log call to MLLM_ERROR (or another
error-level macro used in the codebase) so the message matches the error return
MODEL_TENSOR_ERROR and still includes tensorName, ensuring the log level
reflects the actual failure path.
- Around line 293-297: The code currently calls MLLM_ERROR_EXIT in
QNNModel::addNode when validation fails (validationStatus check), making the
subsequent freeMultiPtr(nodeParams, inputs, outputs) and return
MODEL_GRAPH_ERROR dead code; remove those unreachable lines and either revert to
the original non-fatal behavior (replace MLLM_ERROR_EXIT with MLLM_ERROR and
return MODEL_GRAPH_ERROR after calling freeMultiPtr) or, if process termination
is intentional, keep MLLM_ERROR_EXIT but delete freeMultiPtr and return
statements and add a clear comment documenting that validation failures are
fatal; ensure you modify the branch handling validationStatus and update any
related documentation/tests to reflect the chosen behavior.
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (1)
28-32: Consider using a more descriptive variable name.The logic correctly propagates the
using_qnnattribute to all LinalgIROp nodes. However, the variable nameooois not descriptive. Consider renaming it to something more meaningful likelinalg_opfor better code readability.♻️ Suggested improvement
- writer.walk<ir::linalg::LinalgIROp>( - [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& ooo) -> ir::IRWriter::WalkResult { - if (!ooo->getAttr("using_qnn")) { ooo->setAttr("using_qnn", details::createTrueBoolAttr(ctx)); } + writer.walk<ir::linalg::LinalgIROp>( + [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult { + if (!linalg_op->getAttr("using_qnn")) { linalg_op->setAttr("using_qnn", details::createTrueBoolAttr(ctx)); } return ir::IRWriter::WalkResult::WALK_CONTINUE; });
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
README-ZH.mdmllm/backends/qnn/QNNModel.cppmllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/passes/MarkQnnGraphPass.cppmllm/backends/qnn/aot/visitor/Linear.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/visitor/Linear.cppmllm/backends/qnn/QNNModel.cppmllm/backends/qnn/aot/passes/MarkQnnGraphPass.cppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/QNNUtils.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/visitor/Linear.cppmllm/backends/qnn/QNNModel.cppmllm/backends/qnn/aot/passes/MarkQnnGraphPass.cppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/QNNUtils.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/visitor/Linear.cppmllm/backends/qnn/QNNModel.cppmllm/backends/qnn/aot/passes/MarkQnnGraphPass.cppmllm/backends/qnn/aot/passes/AOTPipeline.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/visitor/Linear.cppmllm/backends/qnn/QNNModel.cppmllm/backends/qnn/aot/passes/MarkQnnGraphPass.cppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/QNNUtils.hpp
🧠 Learnings (1)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Applied to files:
mllm/backends/qnn/QNNUtils.hpp
🧬 Code graph analysis (3)
mllm/backends/qnn/aot/visitor/Linear.cpp (2)
mllm/backends/qnn/QNNUtils.hpp (2)
create(256-258)create(256-256)pymllm/ffi/__init__.py (1)
create(579-582)
mllm/backends/qnn/QNNModel.cpp (1)
mllm/backends/qnn/QNNModel.hpp (4)
tensorName(51-52)tensorName(55-56)tensorName(59-59)tensorName(86-86)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (6)
mllm/backends/qnn/aot/visitor/Linear.hpp (1)
writer(16-16)mllm/backends/qnn/aot/visitor/Matmul.hpp (1)
writer(16-16)mllm/backends/qnn/aot/visitor/RMSNorm.hpp (1)
writer(16-16)mllm/backends/qnn/aot/visitor/Softmax.hpp (1)
writer(16-16)mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
writer(16-16)writer(27-27)mllm/compile/ir/linalg/Op.hpp (1)
ctx(157-157)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-macos
- GitHub Check: build-x86
- GitHub Check: build-android
🔇 Additional comments (4)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)
32-32: LGTM! Verify the pass has been tested in the pipeline.Enabling the LLM2QnnLoweringPass completes the AOT lowering pipeline for LLM applications. Since this pass was previously disabled, ensure it has been thoroughly tested in this position within the pipeline (after MarkTensorIOPass).
mllm/backends/qnn/aot/visitor/Linear.cpp (2)
5-5: Header include order change.The reordering of
DataTypes.hppis cosmetic and does not affect functionality.
73-73: Thekeep_dimsparameter is correctly set tofalsefor the QNN FullyConnected operation.The current setting aligns with QNN HTP backend expectations. The Qwen NPU model documentation indicates FullyConnected operations are expected to produce 2D output in the format
[B * S, H * D], which is consistent withkeep_dims=false. The weight tensor handling (conversion tokUInt8at line 54) is separate from output shape configuration and does not interact with thekeep_dimsparameter.mllm/backends/qnn/QNNUtils.hpp (1)
100-112: LGTM! Log label formatting improvement.The addition of spaces to log level labels improves visual alignment in console output. This is a purely cosmetic change with no functional impact.
| MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName); | ||
| return MODEL_TENSOR_ERROR; |
There was a problem hiding this comment.
Inconsistent error handling: INFO log with ERROR return.
The log level was changed from ERROR to INFO, but the function still returns MODEL_TENSOR_ERROR on Line 165. This creates an inconsistency:
- If duplicate tensors are expected and recoverable, consider returning success or a specific non-error code
- If this is truly an error condition (suggested by the error return), the log should remain at ERROR level
🔧 Suggested fix: Revert to ERROR logging
- MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);
+ MLLM_ERROR("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName);
return MODEL_TENSOR_ERROR;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| MLLM_INFO("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName); | |
| return MODEL_TENSOR_ERROR; | |
| MLLM_ERROR("QNNModel::addTensorWrapper() tensor {} already exists.", tensorName); | |
| return MODEL_TENSOR_ERROR; |
🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNModel.cpp around lines 164 - 165, The log in
QNNModel::addTensorWrapper reports a duplicate tensor with MLLM_INFO while the
function returns MODEL_TENSOR_ERROR, creating inconsistent error handling;
change the log call to MLLM_ERROR (or another error-level macro used in the
codebase) so the message matches the error return MODEL_TENSOR_ERROR and still
includes tensorName, ensuring the log level reflects the actual failure path.
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validation API not supported."); | ||
| } else if (validationStatus != QNN_SUCCESS) { | ||
| MLLM_ERROR("QNNModel::addNode() validating node {} failed.", storedStrings.name); | ||
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validating node {} failed.", storedStrings.name); | ||
| freeMultiPtr(nodeParams, inputs, outputs); | ||
| return MODEL_GRAPH_ERROR; |
There was a problem hiding this comment.
Dead code after ERROR_EXIT and changed error handling behavior.
Two issues:
-
Dead code (Line 296-297): The
MLLM_ERROR_EXITon Line 295 terminates the program, making the subsequentfreeMultiPtrandreturn MODEL_GRAPH_ERRORstatements unreachable. Remove the dead code. -
Breaking change in error handling: Changing from
MLLM_ERROR+ return toMLLM_ERROR_EXITprevents error recovery by calling code. This is a significant behavioral change that:- Terminates the entire process on validation failure
- Prevents cleanup or graceful degradation
- May surprise users expecting error codes rather than process termination
Verify that this aggressive error handling aligns with the intended behavior for validation failures.
🐛 Proposed fix: Remove dead code and document behavior change
auto validationStatus = qnnInterface_.backendValidateOpConfig(backendHandle_, opDefinition);
if (validationStatus == QNN_BACKEND_ERROR_NOT_SUPPORTED) {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validation API not supported.");
} else if (validationStatus != QNN_SUCCESS) {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNNModel::addNode() validating node {} failed.", storedStrings.name);
- freeMultiPtr(nodeParams, inputs, outputs);
- return MODEL_GRAPH_ERROR;
}🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNModel.cpp around lines 293 - 297, The code currently
calls MLLM_ERROR_EXIT in QNNModel::addNode when validation fails
(validationStatus check), making the subsequent freeMultiPtr(nodeParams, inputs,
outputs) and return MODEL_GRAPH_ERROR dead code; remove those unreachable lines
and either revert to the original non-fatal behavior (replace MLLM_ERROR_EXIT
with MLLM_ERROR and return MODEL_GRAPH_ERROR after calling freeMultiPtr) or, if
process termination is intentional, keep MLLM_ERROR_EXIT but delete freeMultiPtr
and return statements and add a clear comment documenting that validation
failures are fatal; ensure you modify the branch handling validationStatus and
update any related documentation/tests to reflect the chosen behavior.
Summary by CodeRabbit
Release Notes
Bug Fixes
Performance Improvements
Updates
✏️ Tip: You can customize this high-level summary in your review settings.