Skip to content

fix(qualcomm): LM Head Merge pass#601

Merged
chenghuaWang merged 1 commit intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main
Jan 12, 2026
Merged

fix(qualcomm): LM Head Merge pass#601
chenghuaWang merged 1 commit intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main

Conversation

@chenghuaWang
Copy link
Copy Markdown
Collaborator

@chenghuaWang chenghuaWang commented Jan 12, 2026

Summary by CodeRabbit

  • New Features

    • Added support for two new quantization formats for Conv2D operations (4-bit weights with 16-bit activations/outputs).
  • Refactor

    • Optimized internal tensor handling in Qwen3 model processing.
    • Improved LLM head graph optimization for better performance.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 12, 2026

📝 Walkthrough

Walkthrough

This PR updates the Qwen3 model architecture to store and use hidden_size_ for tensor reshaping, adds support for two quantized Conv2D operation types in CPU backend, and refactors LLM head detection in QNN AOT passes to use LinalgIROp instead of LinearOp.

Changes

Cohort / File(s) Summary
Qwen3 Model Architecture
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
Added hidden_size_ member variable to Qwen3Text class initialized from config, used for tensor reshaping via view after normalization. Removed explicit reshape in Qwen3ForCausalLM after LLM forward pass.
CPU Conv2D Operations
mllm/backends/cpu/ops/Conv2DOp.cpp
Added support branches for two quantized Conv2D operation types (kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64) that break without weight packing, changing from NYI error handling.
QNN AOT Optimization Pass
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
Replaced LinearOp with LinalgIROp for LLM head detection and IR walker traversal, maintaining functional logic with updated operation type abstraction.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

  • #595: Directly related modifications to Qwen3 model components with hidden_size_ member and tensor reshape adjustments
  • #577: Related QNN AOT pass refactoring replacing LinearOp with LinalgIROp and updating IR walkers

Suggested reviewers

  • oreomaker
  • liang1232018
  • yirongjie

Poem

🐰 Hidden sizes now stored with care,
Tensors reshape through the air,
Conv quantized with grace so fine,
LinalgIROp ops align,
Model flows better everywhere!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning No pull request description was provided by the author, leaving the required context and rationale for the changes unexplained. Add a detailed description explaining the motivation, technical details of the fix, and any testing performed. Follow the repository's contribution guidelines.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the primary fix: addressing the LM Head Merge pass in Qualcomm-related code, which aligns with changes in MergeLLMHeadIntoMainGraphPass.cpp.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Owner

@UbiquitousLearning UbiquitousLearning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @mllm/backends/cpu/ops/Conv2DOp.cpp:
- Around line 64-67: The forward() implementation in Conv2DOp.cpp is missing
handlers for aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G32 and
kQNN_LPBQ_w4a16o16_G64 (load() accepts them), so add explicit case branches in
the inner switch on options_.impl_type in forward() that perform the same
packed-weight convolution logic as the corresponding loaded representation (or
delegate to the existing kDefault path if appropriate); alternatively, add a
default case that raises/logs a clear error (e.g., throw or processLogger.error)
when an unsupported impl_type is encountered so execution cannot silently fall
through. Ensure you reference options_.impl_type, the enum values
kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64, and the forward() and load()
functions when making the change.
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (1)

40-47: Broadened walker scope may match unintended operations.

The change from LinearOp to LinalgIROp broadens the walker's scope to match all linalg IR operations, not just linear operations. If LinalgIROp is a base class with multiple derived types (e.g., LinearOp, MatMulOp, etc.), this walker will now iterate over all of them.

The current logic relies on getAOp()->getName() returning "lm_head" or "lm_head_out" to filter, which should still work correctly if the naming is unique. However:

  1. The variable name linear_op is now misleading—consider renaming to linalg_op for clarity.
  2. Verify this broader matching is intentional and won't accidentally match other ops with similar names.
Suggested naming fix
-  ir::linalg::LinalgIROp::ptr_t llm_head_op = nullptr;
+  ir::linalg::LinalgIROp::ptr_t lm_head_op = nullptr;

   writer.walk<ir::linalg::LinalgIROp>(
-      [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& linear_op) -> ir::IRWriter::WalkResult {
-        auto name = linear_op->getAOp()->getName();
-        if (name == "lm_head" || name == "lm_head_out") { llm_head_op = linear_op; }
+      [&](ir::IRWriter& /*writer*/, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult {
+        auto name = linalg_op->getAOp()->getName();
+        if (name == "lm_head" || name == "lm_head_out") { lm_head_op = linalg_op; }
         return ir::IRWriter::WalkResult::WALK_CONTINUE;
       });
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44a7736 and 320c267.

📒 Files selected for processing (3)
  • examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
  • mllm/backends/cpu/ops/Conv2DOp.cpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/cpu/ops/Conv2DOp.cpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/cpu/ops/Conv2DOp.cpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/cpu/ops/Conv2DOp.cpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/cpu/ops/Conv2DOp.cpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-x86
  • GitHub Check: build-android
  • GitHub Check: build-macos
🔇 Additional comments (1)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)

332-332: LGTM! Consistent pattern for storing configuration values.

The addition of hidden_size_ follows the established pattern in Qwen3MLP and Qwen3Attention, avoiding repeated access to configuration during forward passes. The reshape at line 377 to {1, 1, -1, hidden_size_} correctly prepares the tensor for the downstream Conv2D layer (lm_head_) which expects input shape [N, H=1, W=Seq, In].

Also applies to: 339-339, 377-377

Comment on lines +64 to +67
case aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G32:
case aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G64: {
break;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find the Conv2DOp.cpp file
find . -name "Conv2DOp.cpp" -type f

Repository: UbiquitousLearning/mllm

Length of output: 133


🏁 Script executed:

#!/bin/bash
# Read Conv2DOp.cpp to understand the structure around lines 64-67 and the forward() method
cat -n "mllm/backends/cpu/ops/Conv2DOp.cpp" | head -200

Repository: UbiquitousLearning/mllm

Length of output: 8642


🏁 Script executed:

#!/bin/bash
# Search for usage of the new impl types throughout the codebase
rg -n "kQNN_LPBQ_w4a16o16_G32|kQNN_LPBQ_w4a16o16_G64" --type cpp -A 2 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 3988


🏁 Script executed:

#!/bin/bash
# Search for the enum definition of Conv2DOpImplType to understand all available types
rg -n "Conv2DOpImplType" --type cpp --type hpp | head -30

Repository: UbiquitousLearning/mllm

Length of output: 94


🏁 Script executed:

#!/bin/bash
# Check if there's a header file that defines this enum
find . -name "*.hpp" -o -name "*.h" | xargs grep -l "Conv2DOpImplType" 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 175


🏁 Script executed:

#!/bin/bash
# Check LinearOp forward() to see if it handles these impl types
rg -n "void.*forward\|LinearImplTypes::" mllm/backends/cpu/ops/LinearOp.cpp -A 5 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Search for backend implementations of Conv2DOp to see if there are QNN-specific versions
find . -name "*.cpp" -o -name "*.hpp" | xargs grep -l "Conv2DOp.*forward" 2>/dev/null | head -20

Repository: UbiquitousLearning/mllm

Length of output: 133


🏁 Script executed:

#!/bin/bash
# Check if these impl types are used anywhere in CPU backend context (not QNN AOT)
rg -n "kQNN_LPBQ_w4a16o16" --type cpp -B 5 -A 5 | grep -A 5 -B 5 "cpu"

Repository: UbiquitousLearning/mllm

Length of output: 2180


🏁 Script executed:

#!/bin/bash
# Look at LinearOp.cpp forward() method to understand how it handles impl types
sed -n '380,550p' mllm/backends/cpu/ops/LinearOp.cpp | cat -n

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Check the full forward() signature in LinearOp to see if it switches on impl_type
rg -n "void.*forward" mllm/backends/cpu/ops/LinearOp.cpp -A 50 | head -80

Repository: UbiquitousLearning/mllm

Length of output: 2239


🏁 Script executed:

#!/bin/bash
# Search for QNN backend implementations
find . -path "*/backends/qnn/*" -name "*.cpp" -o -name "*.hpp" 2>/dev/null | head -10

Repository: UbiquitousLearning/mllm

Length of output: 594


🏁 Script executed:

#!/bin/bash
# Check if Conv2DOp is used in QNN context or if forward() is only for CPU
rg -n "Conv2DOp::forward\|conv2d.*forward" --type cpp -B 3 -A 3

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Get the exact code of Conv2DOp.cpp forward() method impl_type switch to see if there's a default case
sed -n '92,180p' mllm/backends/cpu/ops/Conv2DOp.cpp | cat -n

Repository: UbiquitousLearning/mllm

Length of output: 4639


🏁 Script executed:

#!/bin/bash
# Check if LinearOp.cpp forward() has similar issue - look at full impl_type switch
sed -n '100,350p' mllm/backends/cpu/ops/LinearOp.cpp | grep -A 200 "switch (impl_type)"

Repository: UbiquitousLearning/mllm

Length of output: 9815


🏁 Script executed:

#!/bin/bash
# Search for any documentation or comments about QNN impl types being AOT-only
rg -n "AOT|QNN.*forward\|QNN.*CPU" --type cpp --type hpp -B 2 -A 2 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 94


🏁 Script executed:

#!/bin/bash
# Look for any QNN backend or special handling that bypasses CPU forward()
find . -name "*qnn*" -o -name "*QNN*" | grep -E "\.(cpp|hpp|h)$" | head -20

Repository: UbiquitousLearning/mllm

Length of output: 828


🏁 Script executed:

#!/bin/bash
# Check if there's a base class that might handle these impl types differently
rg -n "class.*Conv2DOp\|Conv2DOpImplType" mllm/core/aops/Conv2DOp.hpp -A 5 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Find Conv2DOp.hpp to see the enum definition
find . -name "Conv2DOp.hpp" -type f -exec cat -n {} \;

Repository: UbiquitousLearning/mllm

Length of output: 3566


🏁 Script executed:

#!/bin/bash
# Check if there's QNN Conv2D implementation
find . -path "*/qnn/*" -name "*Conv2D*"

Repository: UbiquitousLearning/mllm

Length of output: 152


🏁 Script executed:

#!/bin/bash
# Check if LinearOp has the same impl types and how it handles them in forward()
rg -n "kQNN_LPBQ_w4a16o16" mllm/backends/cpu/ops/LinearOp.cpp -B 5 -A 10

Repository: UbiquitousLearning/mllm

Length of output: 570


🏁 Script executed:

#!/bin/bash
# Check Conv2D QNN implementation to see if it handles the forward() there
head -100 mllm/backends/qnn/aot/visitor/Conv2D.cpp

Repository: UbiquitousLearning/mllm

Length of output: 3936


🏁 Script executed:

#!/bin/bash
# Check how LinearOp's forward() handles the QNN impl types - does it error or silently skip?
sed -n '100,180p' mllm/backends/cpu/ops/LinearOp.cpp | grep -A 30 "switch (impl_type)"

Repository: UbiquitousLearning/mllm

Length of output: 1608


🏁 Script executed:

#!/bin/bash
# Check if Conv2DOp impl_type switch in forward() has a default case
sed -n '92,82p' mllm/backends/cpu/ops/Conv2DOp.cpp | tail -20

Repository: UbiquitousLearning/mllm

Length of output: 93


🏁 Script executed:

#!/bin/bash
# Verify - look at the exact closing of impl_type switch in Conv2DOp forward()
sed -n '172,175p' mllm/backends/cpu/ops/Conv2DOp.cpp | cat -n

Repository: UbiquitousLearning/mllm

Length of output: 131


🏁 Script executed:

#!/bin/bash
# Confirm there's NO default case in Conv2DOp's impl_type switch
sed -n '92,180p' mllm/backends/cpu/ops/Conv2DOp.cpp | grep -n "default"

Repository: UbiquitousLearning/mllm

Length of output: 112


🏁 Script executed:

#!/bin/bash
# Check if there's a compile-time guard or runtime check that prevents forward() from being called with QNN types
rg -n "impl_type\|kQNN_LPBQ" mllm/core/aops/Conv2DOp.cpp -B 2 -A 2 | head -60

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Final confirmation: show the exact structure of the impl_type switch in Conv2DOp forward()
sed -n '92,180p' mllm/backends/cpu/ops/Conv2DOp.cpp | cat -n | grep -A 1 "impl_type\|^[[:space:]]*[0-9]*[[:space:]]*}"

Repository: UbiquitousLearning/mllm

Length of output: 726


Missing forward() implementation and error handling for new impl types.

The load() method (lines 64-67) accepts kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64 without performing weight packing. However, the forward() method has no case handlers for these impl types—the inner switch (options_.impl_type) at line 94 only handles kDefault. If forward() is called with either of these impl types, execution silently exits the switch without performing any computation.

Unlike LinearOp (which has a default case with error handling), Conv2DOp provides no safety guard. Either add corresponding case handlers in forward() to implement the operation or add a default case that explicitly errors if an unsupported impl type is encountered.

🤖 Prompt for AI Agents
In @mllm/backends/cpu/ops/Conv2DOp.cpp around lines 64 - 67, The forward()
implementation in Conv2DOp.cpp is missing handlers for
aops::Conv2DOpImplType::kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64
(load() accepts them), so add explicit case branches in the inner switch on
options_.impl_type in forward() that perform the same packed-weight convolution
logic as the corresponding loaded representation (or delegate to the existing
kDefault path if appropriate); alternatively, add a default case that
raises/logs a clear error (e.g., throw or processLogger.error) when an
unsupported impl_type is encountered so execution cannot silently fall through.
Ensure you reference options_.impl_type, the enum values kQNN_LPBQ_w4a16o16_G32
and kQNN_LPBQ_w4a16o16_G64, and the forward() and load() functions when making
the change.

@chenghuaWang chenghuaWang merged commit 0eb79bb into UbiquitousLearning:main Jan 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants