fix(qualcomm): LM Head Merge pass by chenghuaWang · Pull Request #601 · UbiquitousLearning/mllm

chenghuaWang · 2026-01-12T09:06:05Z

Summary by CodeRabbit

New Features
- Added support for two new quantization formats for Conv2D operations (4-bit weights with 16-bit activations/outputs).
Refactor
- Optimized internal tensor handling in Qwen3 model processing.
- Improved LLM head graph optimization for better performance.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-12T09:06:16Z

📝 Walkthrough

Walkthrough

This PR updates the Qwen3 model architecture to store and use hidden_size_ for tensor reshaping, adds support for two quantized Conv2D operation types in CPU backend, and refactors LLM head detection in QNN AOT passes to use LinalgIROp instead of LinearOp.

Changes

Cohort / File(s)	Summary
Qwen3 Model Architecture `examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp`	Added `hidden_size_` member variable to `Qwen3Text` class initialized from config, used for tensor reshaping via view after normalization. Removed explicit reshape in `Qwen3ForCausalLM` after LLM forward pass.
CPU Conv2D Operations `mllm/backends/cpu/ops/Conv2DOp.cpp`	Added support branches for two quantized Conv2D operation types (`kQNN_LPBQ_w4a16o16_G32` and `kQNN_LPBQ_w4a16o16_G64`) that break without weight packing, changing from NYI error handling.
QNN AOT Optimization Pass `mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp`	Replaced `LinearOp` with `LinalgIROp` for LLM head detection and IR walker traversal, maintaining functional logic with updated operation type abstraction.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

#595: Directly related modifications to Qwen3 model components with hidden_size_ member and tensor reshape adjustments
#577: Related QNN AOT pass refactoring replacing LinearOp with LinalgIROp and updating IR walkers

Suggested reviewers

oreomaker
liang1232018
yirongjie

Poem

🐰 Hidden sizes now stored with care,
Tensors reshape through the air,
Conv quantized with grace so fine,
LinalgIROp ops align,
Model flows better everywhere!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	No pull request description was provided by the author, leaving the required context and rationale for the changes unexplained.	Add a detailed description explaining the motivation, technical details of the fix, and any testing performed. Follow the repository's contribution guidelines.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the primary fix: addressing the LM Head Merge pass in Qualcomm-related code, which aligns with changes in MergeLLMHeadIntoMainGraphPass.cpp.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

UbiquitousLearning

LGTM

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @mllm/backends/cpu/ops/Conv2DOp.cpp:
- Around line 64-67: The forward() implementation in Conv2DOp.cpp is missing
handlers for aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G32 and
kQNN_LPBQ_w4a16o16_G64 (load() accepts them), so add explicit case branches in
the inner switch on options_.impl_type in forward() that perform the same
packed-weight convolution logic as the corresponding loaded representation (or
delegate to the existing kDefault path if appropriate); alternatively, add a
default case that raises/logs a clear error (e.g., throw or processLogger.error)
when an unsupported impl_type is encountered so execution cannot silently fall
through. Ensure you reference options_.impl_type, the enum values
kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64, and the forward() and load()
functions when making the change.

🧹 Nitpick comments (1)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (1)
40-47: Broadened walker scope may match unintended operations.

The change from LinearOp to LinalgIROp broadens the walker's scope to match all linalg IR operations, not just linear operations. If LinalgIROp is a base class with multiple derived types (e.g., LinearOp, MatMulOp, etc.), this walker will now iterate over all of them.

The current logic relies on getAOp()->getName() returning "lm_head" or "lm_head_out" to filter, which should still work correctly if the naming is unique. However:

The variable name linear_op is now misleading—consider renaming to linalg_op for clarity.

Verify this broader matching is intentional and won't accidentally match other ops with similar names.
Suggested naming fix
-  ir::linalg::LinalgIROp::ptr_t llm_head_op = nullptr;
+  ir::linalg::LinalgIROp::ptr_t lm_head_op = nullptr;

   writer.walk<ir::linalg::LinalgIROp>(
-      [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& linear_op) -> ir::IRWriter::WalkResult {
-        auto name = linear_op->getAOp()->getName();
-        if (name == "lm_head" || name == "lm_head_out") { llm_head_op = linear_op; }
+      [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult {
+        auto name = linalg_op->getAOp()->getName();
+        if (name == "lm_head" || name == "lm_head_out") { lm_head_op = linalg_op; }
         return ir::IRWriter::WalkResult::WALK_CONTINUE;
       });

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44a7736 and 320c267.

📒 Files selected for processing (3)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
mllm/backends/cpu/ops/Conv2DOp.cpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/cpu/ops/Conv2DOp.cpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/cpu/ops/Conv2DOp.cpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/cpu/ops/Conv2DOp.cpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/cpu/ops/Conv2DOp.cpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-x86
GitHub Check: build-android
GitHub Check: build-macos

🔇 Additional comments (1)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)

332-332: LGTM! Consistent pattern for storing configuration values.

The addition of hidden_size_ follows the established pattern in Qwen3MLP and Qwen3Attention, avoiding repeated access to configuration during forward passes. The reshape at line 377 to {1, 1, -1, hidden_size_} correctly prepares the tensor for the downstream Conv2D layer (lm_head_) which expects input shape [N, H=1, W=Seq, In].

Also applies to: 339-339, 377-377

coderabbitai · 2026-01-12T09:10:47Z

+    case aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G32:
+    case aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G64: {
+      break;
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, find the Conv2DOp.cpp file find . -name "Conv2DOp.cpp" -type f

Repository: UbiquitousLearning/mllm

Length of output: 133

🏁 Script executed:

#!/bin/bash # Read Conv2DOp.cpp to understand the structure around lines 64-67 and the forward() method cat -n "mllm/backends/cpu/ops/Conv2DOp.cpp" | head -200

Repository: UbiquitousLearning/mllm

Length of output: 8642

🏁 Script executed:

#!/bin/bash # Search for usage of the new impl types throughout the codebase rg -n "kQNN_LPBQ_w4a16o16_G32|kQNN_LPBQ_w4a16o16_G64" --type cpp -A 2 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 3988

🏁 Script executed:

#!/bin/bash # Search for the enum definition of Conv2DOpImplType to understand all available types rg -n "Conv2DOpImplType" --type cpp --type hpp | head -30

Repository: UbiquitousLearning/mllm

Length of output: 94

🏁 Script executed:

#!/bin/bash # Check if there's a header file that defines this enum find . -name "*.hpp" -o -name "*.h" | xargs grep -l "Conv2DOpImplType" 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 175

🏁 Script executed:

#!/bin/bash # Check LinearOp forward() to see if it handles these impl types rg -n "void.*forward\|LinearImplTypes::" mllm/backends/cpu/ops/LinearOp.cpp -A 5 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

#!/bin/bash # Search for backend implementations of Conv2DOp to see if there are QNN-specific versions find . -name "*.cpp" -o -name "*.hpp" | xargs grep -l "Conv2DOp.*forward" 2>/dev/null | head -20

Repository: UbiquitousLearning/mllm

Length of output: 133

🏁 Script executed:

#!/bin/bash # Check if these impl types are used anywhere in CPU backend context (not QNN AOT) rg -n "kQNN_LPBQ_w4a16o16" --type cpp -B 5 -A 5 | grep -A 5 -B 5 "cpu"

Repository: UbiquitousLearning/mllm

Length of output: 2180

🏁 Script executed:

#!/bin/bash # Look at LinearOp.cpp forward() method to understand how it handles impl types sed -n '380,550p' mllm/backends/cpu/ops/LinearOp.cpp | cat -n

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

#!/bin/bash # Check the full forward() signature in LinearOp to see if it switches on impl_type rg -n "void.*forward" mllm/backends/cpu/ops/LinearOp.cpp -A 50 | head -80

Repository: UbiquitousLearning/mllm

Length of output: 2239

🏁 Script executed:

#!/bin/bash # Search for QNN backend implementations find . -path "*/backends/qnn/*" -name "*.cpp" -o -name "*.hpp" 2>/dev/null | head -10

Repository: UbiquitousLearning/mllm

Length of output: 594

🏁 Script executed:

#!/bin/bash # Check if Conv2DOp is used in QNN context or if forward() is only for CPU rg -n "Conv2DOp::forward\|conv2d.*forward" --type cpp -B 3 -A 3

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

#!/bin/bash # Get the exact code of Conv2DOp.cpp forward() method impl_type switch to see if there's a default case sed -n '92,180p' mllm/backends/cpu/ops/Conv2DOp.cpp | cat -n

Repository: UbiquitousLearning/mllm

Length of output: 4639

🏁 Script executed:

#!/bin/bash # Check if LinearOp.cpp forward() has similar issue - look at full impl_type switch sed -n '100,350p' mllm/backends/cpu/ops/LinearOp.cpp | grep -A 200 "switch (impl_type)"

Repository: UbiquitousLearning/mllm

Length of output: 9815

🏁 Script executed:

#!/bin/bash # Search for any documentation or comments about QNN impl types being AOT-only rg -n "AOT|QNN.*forward\|QNN.*CPU" --type cpp --type hpp -B 2 -A 2 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 94

🏁 Script executed:

#!/bin/bash # Look for any QNN backend or special handling that bypasses CPU forward() find . -name "*qnn*" -o -name "*QNN*" | grep -E "\.(cpp|hpp|h)$" | head -20

Repository: UbiquitousLearning/mllm

Length of output: 828

🏁 Script executed:

#!/bin/bash # Check if there's a base class that might handle these impl types differently rg -n "class.*Conv2DOp\|Conv2DOpImplType" mllm/core/aops/Conv2DOp.hpp -A 5 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

#!/bin/bash # Find Conv2DOp.hpp to see the enum definition find . -name "Conv2DOp.hpp" -type f -exec cat -n {} \;

Repository: UbiquitousLearning/mllm

Length of output: 3566

🏁 Script executed:

#!/bin/bash # Check if there's QNN Conv2D implementation find . -path "*/qnn/*" -name "*Conv2D*"

Repository: UbiquitousLearning/mllm

Length of output: 152

🏁 Script executed:

#!/bin/bash # Check if LinearOp has the same impl types and how it handles them in forward() rg -n "kQNN_LPBQ_w4a16o16" mllm/backends/cpu/ops/LinearOp.cpp -B 5 -A 10

Repository: UbiquitousLearning/mllm

Length of output: 570

🏁 Script executed:

#!/bin/bash # Check Conv2D QNN implementation to see if it handles the forward() there head -100 mllm/backends/qnn/aot/visitor/Conv2D.cpp

Repository: UbiquitousLearning/mllm

Length of output: 3936

🏁 Script executed:

#!/bin/bash # Check how LinearOp's forward() handles the QNN impl types - does it error or silently skip? sed -n '100,180p' mllm/backends/cpu/ops/LinearOp.cpp | grep -A 30 "switch (impl_type)"

Repository: UbiquitousLearning/mllm

Length of output: 1608

🏁 Script executed:

#!/bin/bash # Check if Conv2DOp impl_type switch in forward() has a default case sed -n '92,82p' mllm/backends/cpu/ops/Conv2DOp.cpp | tail -20

Repository: UbiquitousLearning/mllm

Length of output: 93

🏁 Script executed:

#!/bin/bash # Verify - look at the exact closing of impl_type switch in Conv2DOp forward() sed -n '172,175p' mllm/backends/cpu/ops/Conv2DOp.cpp | cat -n

Repository: UbiquitousLearning/mllm

Length of output: 131

🏁 Script executed:

#!/bin/bash # Confirm there's NO default case in Conv2DOp's impl_type switch sed -n '92,180p' mllm/backends/cpu/ops/Conv2DOp.cpp | grep -n "default"

Repository: UbiquitousLearning/mllm

Length of output: 112

🏁 Script executed:

#!/bin/bash # Check if there's a compile-time guard or runtime check that prevents forward() from being called with QNN types rg -n "impl_type\|kQNN_LPBQ" mllm/core/aops/Conv2DOp.cpp -B 2 -A 2 | head -60

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

#!/bin/bash # Final confirmation: show the exact structure of the impl_type switch in Conv2DOp forward() sed -n '92,180p' mllm/backends/cpu/ops/Conv2DOp.cpp | cat -n | grep -A 1 "impl_type\|^[[:space:]]*[0-9]*[[:space:]]*}"

Repository: UbiquitousLearning/mllm

Length of output: 726

Missing forward() implementation and error handling for new impl types.

The load() method (lines 64-67) accepts kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64 without performing weight packing. However, the forward() method has no case handlers for these impl types—the inner switch (options_.impl_type) at line 94 only handles kDefault. If forward() is called with either of these impl types, execution silently exits the switch without performing any computation.

Unlike LinearOp (which has a default case with error handling), Conv2DOp provides no safety guard. Either add corresponding case handlers in forward() to implement the operation or add a default case that explicitly errors if an unsupported impl type is encountered.

🤖 Prompt for AI Agents

In @mllm/backends/cpu/ops/Conv2DOp.cpp around lines 64 - 67, The forward() implementation in Conv2DOp.cpp is missing handlers for aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64 (load() accepts them), so add explicit case branches in the inner switch on options_.impl_type in forward() that perform the same packed-weight convolution logic as the corresponding loaded representation (or delegate to the existing kDefault path if appropriate); alternatively, add a default case that raises/logs a clear error (e.g., throw or processLogger.error) when an unsupported impl_type is encountered so execution cannot silently fall through. Ensure you reference options_.impl_type, the enum values kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64, and the forward() and load() functions when making the change.

fix(qualcomm): LM Head Merge pass

320c267

chenghuaWang requested review from liang1232018, oreomaker and yirongjie as code owners January 12, 2026 09:06

UbiquitousLearning approved these changes Jan 12, 2026

View reviewed changes

coderabbitai Bot reviewed Jan 12, 2026

View reviewed changes

chenghuaWang merged commit 0eb79bb into UbiquitousLearning:main Jan 12, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(qualcomm): LM Head Merge pass#601

fix(qualcomm): LM Head Merge pass#601
chenghuaWang merged 1 commit intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main

chenghuaWang commented Jan 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

UbiquitousLearning left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chenghuaWang commented Jan 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

UbiquitousLearning left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghuaWang commented Jan 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 12, 2026 •

edited

Loading