fix(qualcomm): LM Head Merge pass#601
Conversation
📝 WalkthroughWalkthroughThis PR updates the Qwen3 model architecture to store and use Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @mllm/backends/cpu/ops/Conv2DOp.cpp:
- Around line 64-67: The forward() implementation in Conv2DOp.cpp is missing
handlers for aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G32 and
kQNN_LPBQ_w4a16o16_G64 (load() accepts them), so add explicit case branches in
the inner switch on options_.impl_type in forward() that perform the same
packed-weight convolution logic as the corresponding loaded representation (or
delegate to the existing kDefault path if appropriate); alternatively, add a
default case that raises/logs a clear error (e.g., throw or processLogger.error)
when an unsupported impl_type is encountered so execution cannot silently fall
through. Ensure you reference options_.impl_type, the enum values
kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64, and the forward() and load()
functions when making the change.
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (1)
40-47: Broadened walker scope may match unintended operations.The change from
LinearOptoLinalgIROpbroadens the walker's scope to match all linalg IR operations, not just linear operations. IfLinalgIROpis a base class with multiple derived types (e.g.,LinearOp,MatMulOp, etc.), this walker will now iterate over all of them.The current logic relies on
getAOp()->getName()returning"lm_head"or"lm_head_out"to filter, which should still work correctly if the naming is unique. However:
- The variable name
linear_opis now misleading—consider renaming tolinalg_opfor clarity.- Verify this broader matching is intentional and won't accidentally match other ops with similar names.
Suggested naming fix
- ir::linalg::LinalgIROp::ptr_t llm_head_op = nullptr; + ir::linalg::LinalgIROp::ptr_t lm_head_op = nullptr; writer.walk<ir::linalg::LinalgIROp>( - [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& linear_op) -> ir::IRWriter::WalkResult { - auto name = linear_op->getAOp()->getName(); - if (name == "lm_head" || name == "lm_head_out") { llm_head_op = linear_op; } + [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult { + auto name = linalg_op->getAOp()->getName(); + if (name == "lm_head" || name == "lm_head_out") { lm_head_op = linalg_op; } return ir::IRWriter::WalkResult::WALK_CONTINUE; });
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hppmllm/backends/cpu/ops/Conv2DOp.cppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/cpu/ops/Conv2DOp.cppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/cpu/ops/Conv2DOp.cppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/cpu/ops/Conv2DOp.cppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/cpu/ops/Conv2DOp.cppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-x86
- GitHub Check: build-android
- GitHub Check: build-macos
🔇 Additional comments (1)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)
332-332: LGTM! Consistent pattern for storing configuration values.The addition of
hidden_size_follows the established pattern inQwen3MLPandQwen3Attention, avoiding repeated access to configuration during forward passes. The reshape at line 377 to{1, 1, -1, hidden_size_}correctly prepares the tensor for the downstreamConv2Dlayer (lm_head_) which expects input shape[N, H=1, W=Seq, In].Also applies to: 339-339, 377-377
| case aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G32: | ||
| case aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G64: { | ||
| break; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find the Conv2DOp.cpp file
find . -name "Conv2DOp.cpp" -type fRepository: UbiquitousLearning/mllm
Length of output: 133
🏁 Script executed:
#!/bin/bash
# Read Conv2DOp.cpp to understand the structure around lines 64-67 and the forward() method
cat -n "mllm/backends/cpu/ops/Conv2DOp.cpp" | head -200Repository: UbiquitousLearning/mllm
Length of output: 8642
🏁 Script executed:
#!/bin/bash
# Search for usage of the new impl types throughout the codebase
rg -n "kQNN_LPBQ_w4a16o16_G32|kQNN_LPBQ_w4a16o16_G64" --type cpp -A 2 -B 2Repository: UbiquitousLearning/mllm
Length of output: 3988
🏁 Script executed:
#!/bin/bash
# Search for the enum definition of Conv2DOpImplType to understand all available types
rg -n "Conv2DOpImplType" --type cpp --type hpp | head -30Repository: UbiquitousLearning/mllm
Length of output: 94
🏁 Script executed:
#!/bin/bash
# Check if there's a header file that defines this enum
find . -name "*.hpp" -o -name "*.h" | xargs grep -l "Conv2DOpImplType" 2>/dev/nullRepository: UbiquitousLearning/mllm
Length of output: 175
🏁 Script executed:
#!/bin/bash
# Check LinearOp forward() to see if it handles these impl types
rg -n "void.*forward\|LinearImplTypes::" mllm/backends/cpu/ops/LinearOp.cpp -A 5 | head -100Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Search for backend implementations of Conv2DOp to see if there are QNN-specific versions
find . -name "*.cpp" -o -name "*.hpp" | xargs grep -l "Conv2DOp.*forward" 2>/dev/null | head -20Repository: UbiquitousLearning/mllm
Length of output: 133
🏁 Script executed:
#!/bin/bash
# Check if these impl types are used anywhere in CPU backend context (not QNN AOT)
rg -n "kQNN_LPBQ_w4a16o16" --type cpp -B 5 -A 5 | grep -A 5 -B 5 "cpu"Repository: UbiquitousLearning/mllm
Length of output: 2180
🏁 Script executed:
#!/bin/bash
# Look at LinearOp.cpp forward() method to understand how it handles impl types
sed -n '380,550p' mllm/backends/cpu/ops/LinearOp.cpp | cat -nRepository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Check the full forward() signature in LinearOp to see if it switches on impl_type
rg -n "void.*forward" mllm/backends/cpu/ops/LinearOp.cpp -A 50 | head -80Repository: UbiquitousLearning/mllm
Length of output: 2239
🏁 Script executed:
#!/bin/bash
# Search for QNN backend implementations
find . -path "*/backends/qnn/*" -name "*.cpp" -o -name "*.hpp" 2>/dev/null | head -10Repository: UbiquitousLearning/mllm
Length of output: 594
🏁 Script executed:
#!/bin/bash
# Check if Conv2DOp is used in QNN context or if forward() is only for CPU
rg -n "Conv2DOp::forward\|conv2d.*forward" --type cpp -B 3 -A 3Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Get the exact code of Conv2DOp.cpp forward() method impl_type switch to see if there's a default case
sed -n '92,180p' mllm/backends/cpu/ops/Conv2DOp.cpp | cat -nRepository: UbiquitousLearning/mllm
Length of output: 4639
🏁 Script executed:
#!/bin/bash
# Check if LinearOp.cpp forward() has similar issue - look at full impl_type switch
sed -n '100,350p' mllm/backends/cpu/ops/LinearOp.cpp | grep -A 200 "switch (impl_type)"Repository: UbiquitousLearning/mllm
Length of output: 9815
🏁 Script executed:
#!/bin/bash
# Search for any documentation or comments about QNN impl types being AOT-only
rg -n "AOT|QNN.*forward\|QNN.*CPU" --type cpp --type hpp -B 2 -A 2 | head -50Repository: UbiquitousLearning/mllm
Length of output: 94
🏁 Script executed:
#!/bin/bash
# Look for any QNN backend or special handling that bypasses CPU forward()
find . -name "*qnn*" -o -name "*QNN*" | grep -E "\.(cpp|hpp|h)$" | head -20Repository: UbiquitousLearning/mllm
Length of output: 828
🏁 Script executed:
#!/bin/bash
# Check if there's a base class that might handle these impl types differently
rg -n "class.*Conv2DOp\|Conv2DOpImplType" mllm/core/aops/Conv2DOp.hpp -A 5 -B 2Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Find Conv2DOp.hpp to see the enum definition
find . -name "Conv2DOp.hpp" -type f -exec cat -n {} \;Repository: UbiquitousLearning/mllm
Length of output: 3566
🏁 Script executed:
#!/bin/bash
# Check if there's QNN Conv2D implementation
find . -path "*/qnn/*" -name "*Conv2D*"Repository: UbiquitousLearning/mllm
Length of output: 152
🏁 Script executed:
#!/bin/bash
# Check if LinearOp has the same impl types and how it handles them in forward()
rg -n "kQNN_LPBQ_w4a16o16" mllm/backends/cpu/ops/LinearOp.cpp -B 5 -A 10Repository: UbiquitousLearning/mllm
Length of output: 570
🏁 Script executed:
#!/bin/bash
# Check Conv2D QNN implementation to see if it handles the forward() there
head -100 mllm/backends/qnn/aot/visitor/Conv2D.cppRepository: UbiquitousLearning/mllm
Length of output: 3936
🏁 Script executed:
#!/bin/bash
# Check how LinearOp's forward() handles the QNN impl types - does it error or silently skip?
sed -n '100,180p' mllm/backends/cpu/ops/LinearOp.cpp | grep -A 30 "switch (impl_type)"Repository: UbiquitousLearning/mllm
Length of output: 1608
🏁 Script executed:
#!/bin/bash
# Check if Conv2DOp impl_type switch in forward() has a default case
sed -n '92,82p' mllm/backends/cpu/ops/Conv2DOp.cpp | tail -20Repository: UbiquitousLearning/mllm
Length of output: 93
🏁 Script executed:
#!/bin/bash
# Verify - look at the exact closing of impl_type switch in Conv2DOp forward()
sed -n '172,175p' mllm/backends/cpu/ops/Conv2DOp.cpp | cat -nRepository: UbiquitousLearning/mllm
Length of output: 131
🏁 Script executed:
#!/bin/bash
# Confirm there's NO default case in Conv2DOp's impl_type switch
sed -n '92,180p' mllm/backends/cpu/ops/Conv2DOp.cpp | grep -n "default"Repository: UbiquitousLearning/mllm
Length of output: 112
🏁 Script executed:
#!/bin/bash
# Check if there's a compile-time guard or runtime check that prevents forward() from being called with QNN types
rg -n "impl_type\|kQNN_LPBQ" mllm/core/aops/Conv2DOp.cpp -B 2 -A 2 | head -60Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Final confirmation: show the exact structure of the impl_type switch in Conv2DOp forward()
sed -n '92,180p' mllm/backends/cpu/ops/Conv2DOp.cpp | cat -n | grep -A 1 "impl_type\|^[[:space:]]*[0-9]*[[:space:]]*}"Repository: UbiquitousLearning/mllm
Length of output: 726
Missing forward() implementation and error handling for new impl types.
The load() method (lines 64-67) accepts kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64 without performing weight packing. However, the forward() method has no case handlers for these impl types—the inner switch (options_.impl_type) at line 94 only handles kDefault. If forward() is called with either of these impl types, execution silently exits the switch without performing any computation.
Unlike LinearOp (which has a default case with error handling), Conv2DOp provides no safety guard. Either add corresponding case handlers in forward() to implement the operation or add a default case that explicitly errors if an unsupported impl type is encountered.
🤖 Prompt for AI Agents
In @mllm/backends/cpu/ops/Conv2DOp.cpp around lines 64 - 67, The forward()
implementation in Conv2DOp.cpp is missing handlers for
aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64
(load() accepts them), so add explicit case branches in the inner switch on
options_.impl_type in forward() that perform the same packed-weight convolution
logic as the corresponding loaded representation (or delegate to the existing
kDefault path if appropriate); alternatively, add a default case that
raises/logs a clear error (e.g., throw or processLogger.error) when an
unsupported impl_type is encountered so execution cannot silently fall through.
Ensure you reference options_.impl_type, the enum values kQNN_LPBQ_w4a16o16_G32
and kQNN_LPBQ_w4a16o16_G64, and the forward() and load() functions when making
the change.
Summary by CodeRabbit
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.