fix: LPBQ return shape fellow qnn spec#595
fix: LPBQ return shape fellow qnn spec#595chenghuaWang merged 3 commits intoUbiquitousLearning:mainfrom
Conversation
📝 WalkthroughWalkthroughThis PR introduces shape canonicalization for quantized tensors in QNN AOT compilation. It adds a new LPBQ canonical IR pass for tensor reshaping, modifies Qwen3 model MLP/Attention components with explicit size tracking, adjusts CPU linear operation shape handling, updates quantization bitwidth parameters, and removes unnecessary type conversions in weight handling. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp:
- Around line 13-15: The file contains a duplicate include of
mllm/engine/Context.hpp; remove the redundant include so the header is only
included once (leave the single remaining #include "mllm/engine/Context.hpp" and
delete the second occurrence) to eliminate the duplicate include directive in
LPBQCanonicalizePass.cpp.
🧹 Nitpick comments (4)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)
398-421: Good defensive error handling for missing KV caches.The explicit error handling with descriptive messages is appropriate for AOT compilation where KV caches must be provided.
Consider removing the redundant comments (lines 403-405, 416-418) since the
throwstatements are self-explanatory:♻️ Optional cleanup
if (input.count(past_key_name)) { kv_caches.push_back(input.at(past_key_name)); } else { - // If KV cache doesn't exist, we need to handle this case - // For now, we'll create empty tensors or handle it appropriately - // This might need adjustment based on your initialization logic throw std::runtime_error("Missing KV cache for layer " + std::to_string(i)); }mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.hpp (1)
11-18: Consider explicitly deleting copy/move operations for clarity.The static analysis flags that the class defines a destructor but not other special member functions. If the base class
ir::Passalready handles this (common for polymorphic pass types), this is benign. However, being explicit improves clarity and suppresses warnings.♻️ Suggested explicit deletion
class LPBQCanonicalizePass final : public ir::Pass { public: LPBQCanonicalizePass() = default; + LPBQCanonicalizePass(const LPBQCanonicalizePass&) = delete; + LPBQCanonicalizePass& operator=(const LPBQCanonicalizePass&) = delete; + LPBQCanonicalizePass(LPBQCanonicalizePass&&) = delete; + LPBQCanonicalizePass& operator=(LPBQCanonicalizePass&&) = delete; ~LPBQCanonicalizePass() override = default; uint8_t run(const ir::node_ptr_t& op) override; };mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp (1)
70-70: Use a descriptive variable name.
cc_help_meis non-descriptive. Consider renaming toconsumer_opor similar for clarity.- for (auto cc_help_me : consumer_ops) { - MLLM_RT_ASSERT(cc_help_me->isa_<ir::Op>()); + for (auto consumer_op : consumer_ops) { + MLLM_RT_ASSERT(consumer_op->isa_<ir::Op>());mllm/backends/cpu/ops/LinearOp.cpp (1)
361-366: Add explanatory comment for shape canonicalization.The conditional removal of the leading dimension when
o_shape[0] == 1for LPBQ types implements shape canonicalization per the QNN LPBQ specification and is safe—the input is guaranteed to have at least 2 dimensions from the upstream assertion at line 105. A comment explaining the spec requirement would improve code clarity:case aops::LinearImplTypes::kQNN_LPBQ_w4a16o16_G32: case aops::LinearImplTypes::kQNN_LPBQ_w4a16o16_G64: { + // Canonicalize shape to match QNN LPBQ spec: squeeze leading dimension when batch=1 if (o_shape[0] == 1) { o_shape.erase(o_shape.begin()); } o_dtype = kUInt16PerTensorAsy; break;
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hppmllm/backends/cpu/ops/LinearOp.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cppmllm/backends/qnn/aot/passes/LPBQCanonicalizePass.hppmllm/backends/qnn/aot/visitor/Linear.cpp
💤 Files with no reviewable changes (1)
- mllm/backends/qnn/aot/visitor/Linear.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.hppmllm/backends/cpu/ops/LinearOp.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.hppmllm/backends/cpu/ops/LinearOp.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.hppmllm/backends/cpu/ops/LinearOp.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/cpu/ops/LinearOp.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp
🧬 Code graph analysis (2)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (4)
mllm/core/aops/ViewOp.hpp (3)
inputs(24-24)inputs(26-26)inputs(28-28)mllm/nn/Module.hpp (14)
inputs(174-174)inputs(178-178)inputs(182-182)inputs(184-184)inputs(186-186)inputs(216-220)inputs(216-216)inputs(240-244)inputs(240-240)inputs(263-267)args(148-170)args(148-148)reg(83-115)reg(83-83)mllm/nn/Functional.hpp (1)
sigmoid(163-163)mllm/nn/Functional.cpp (2)
sigmoid(210-213)sigmoid(210-210)
mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp (1)
mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.hpp (1)
op(17-17)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.hpp
[error] 11-11: class 'LPBQCanonicalizePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
[error] 134-134: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 135-135: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 144-144: variable name 'o' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp
[error] 15-15: duplicate include
(readability-duplicate-include,-warnings-as-errors)
[error] 23-23: 2 adjacent parameters of 'visitCallGraph' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 31-31: 2 adjacent parameters of 'visitSubGraph' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-android
- GitHub Check: build-x86
- GitHub Check: build-macos
🔇 Additional comments (7)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (4)
120-131: LGTM! Member variables added for shape canonicalization.The addition of
hidden_size_andintermediate_size_to cache configuration values for use in theview()operations is appropriate. Initialization fromcfgafter linear layer registration is correct since these values are only used inforward().
137-145: Shape canonicalization for QNN spec compliance looks correct.The
view()operations ensure tensors maintain the expected 3D shape{batch, seq, feature}after linear projections. The hardcoded batch size of1is typical for QNN AOT compilation.Please verify that the batch size constraint of
1is documented or enforced at the model entry point, to avoid runtime shape mismatches if this model is invoked with different batch sizes.
180-183: Linear layer registrations correctly updated with implementation type.The addition of
cfg.linear_impl_typeas a parameter to all Linear layer registrations is consistent with the pattern used inQwen3MLP(lines 126, 128, 129) and thelm_head_registration (line 384).
266-269: LGTM! Attention output shape canonicalization.The
view({1, -1, hidden_size_})aftero_proj_ensures the attention output maintains the expected 3D shape, consistent with the MLP pattern. Thehidden_size_value correctly matches theo_proj_output dimension.mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp (2)
91-112: LGTM for the pass entry point logic.The
runmethod correctly validates the ModuleOp input, finds the main CallGraphOp entry point, and delegates tovisitCallGraph. The assertion at line 102 enforces a single-entry-point invariant which is appropriate for this pass structure.
23-29: Traversal logic looks correct.The
visitCallGraphfunction properly resolves the subgraph symbol and delegates tovisitSubGraph. The null check assertion provides good safety.mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
174-182: LPBQ blockwise quantization configuration is correct for 4-bit weight quantization.The
blockScaleBitwidth = 4setting is consistent with the 4-bit weight quantization scheme used throughout the codebase (e.g.,kQNN_LPBQ_w4a16o16_G32/G64types). The combination withQNN_BLOCKWISE_EXPANSION_BITWIDTH_SCALE_STORAGE_8andblocksScale8reflects the standard pattern of packing two 4-bit scale values into each 8-bit storage byte, which is explicitly documented inPTQPass.cpp("using uint8 to store uint4").
| #include "mllm/engine/Context.hpp" | ||
| #include "mllm/utils/Common.hpp" | ||
| #include "mllm/engine/Context.hpp" |
There was a problem hiding this comment.
Remove duplicate include.
mllm/engine/Context.hpp is included twice (lines 13 and 15).
🔧 Proposed fix
#include "mllm/core/aops/ViewOp.hpp"
#include "mllm/engine/Context.hpp"
#include "mllm/utils/Common.hpp"
-#include "mllm/engine/Context.hpp"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include "mllm/engine/Context.hpp" | |
| #include "mllm/utils/Common.hpp" | |
| #include "mllm/engine/Context.hpp" | |
| #include "mllm/engine/Context.hpp" | |
| #include "mllm/utils/Common.hpp" |
🧰 Tools
🪛 Clang (14.0.6)
[error] 15-15: duplicate include
(readability-duplicate-include,-warnings-as-errors)
🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/passes/LPBQCanonicalizePass.cpp around lines 13 - 15,
The file contains a duplicate include of mllm/engine/Context.hpp; remove the
redundant include so the header is only included once (leave the single
remaining #include "mllm/engine/Context.hpp" and delete the second occurrence)
to eliminate the duplicate include directive in LPBQCanonicalizePass.cpp.
| // Find all operators that eats original o, and set them with new_o | ||
| auto consumer_ops = o->consumerOps(); | ||
| for (auto cc_help_me : consumer_ops) { | ||
| MLLM_RT_ASSERT(cc_help_me->isa_<ir::Op>()); | ||
| auto& inputs = cc_help_me->inputs(); | ||
| auto& outputs = cc_help_me->outputs(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Incomplete consumer rewiring logic - pass will not function correctly.
The loop iterates over consumer_ops and obtains their inputs() and outputs(), but never actually replaces references to o with new_o. This means the ViewOp is created but downstream operations will still consume the reshaped tensor o instead of the restored-shape new_o, breaking the intended canonicalization.
🐛 Proposed fix to complete the rewiring
// Find all operators that eats original o, and set them with new_o
auto consumer_ops = o->consumerOps();
for (auto cc_help_me : consumer_ops) {
+ // Skip the newly created ViewOp
+ if (cc_help_me == view_op) continue;
+
MLLM_RT_ASSERT(cc_help_me->isa_<ir::Op>());
auto& inputs = cc_help_me->inputs();
- auto& outputs = cc_help_me->outputs();
+ for (size_t i = 0; i < inputs.size(); ++i) {
+ if (inputs[i] == o) {
+ cc_help_me->replaceInput(i, new_o);
+ }
+ }
}Note: The exact API for replacing inputs depends on the ir::Op interface. Please verify replaceInput or equivalent method exists and adjust accordingly.
Committable suggestion skipped: line range outside the PR's diff.
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.