feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline by chenghuaWang · Pull Request #569 · UbiquitousLearning/mllm

chenghuaWang · 2025-12-23T09:59:11Z

Introduced MarkTensorIO pass to tag tensor inputs and outputs in the CallGraphOp.
Updated AOTPipeline to include the new MarkTensorIO pass.
Created MarkTensorIO implementation and header files.
Modified OpNamingPass to ensure proper namespace usage.
Added base pattern class for quantization recipes and implemented matching logic for Add operations.
Enhanced LinalgIRQuantizationAnnotationAttr to include a dump method for better debugging.
Introduced UUID management for quantization specifications to ensure unique identification.
Cleaned up TensorValue dump method by removing unnecessary constant attribute printing.

Summary by CodeRabbit

New Features
- Added LLM recipe config support and tensor I/O marking in the compilation pipeline for improved AOT handling.
Refactor
- Moved compilation pass namespaces into the AOT submodule for clearer organization.
Chores
- Improved quantization handling (including pre-quantization of value states) and added UUID-tracked quantization specs with richer debug printing.
- Minor visitor/pattern scaffolding and explicit tensor datatype initializations.
Bug Fixes
- Broadened tensor equality support for additional dtypes.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Introduced MarkTensorIO pass to tag tensor inputs and outputs in the CallGraphOp. - Updated AOTPipeline to include the new MarkTensorIO pass. - Created MarkTensorIO implementation and header files. - Modified OpNamingPass to ensure proper namespace usage. - Added base pattern class for quantization recipes and implemented matching logic for Add operations. - Enhanced LinalgIRQuantizationAnnotationAttr to include a dump method for better debugging. - Introduced UUID management for quantization specifications to ensure unique identification. - Cleaned up TensorValue dump method by removing unnecessary constant attribute printing.

coderabbitai · 2025-12-23T09:59:22Z

Walkthrough

Adds a MarkTensorIOPass to the QNN AOT lowering pipeline, introduces a quant-recipe pattern framework and an Add quant-recipe pattern, adds UUID tracking and textual dump for quantization specs, adjusts example KV-cache quantization for value_states, and renames a few namespaces for AOT consistency.

Changes

Cohort / File(s)	Summary
Qwen3 AOT example `examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp`, `examples/qwen3_qnn_aot/qnn_aot_cfg.json`, `examples/qwen3_qnn_aot/compile.cpp`	value_states now quantized to kInt8PerTensorSym before KV concatenation; example tensors initialized with explicit dtypes; example config enables `"split_graph": 1` and `"quant_recipe": {"llm_recipe": true}`.
MarkTensorIO pass & pipeline integration `mllm/backends/qnn/aot/passes/MarkTensorIO.hpp`, `mllm/backends/qnn/aot/passes/MarkTensorIO.cpp`, `mllm/backends/qnn/aot/passes/AOTPipeline.cpp`	New MarkTensorIOPass added and injected into AOT lowering pipeline; validates `quant_recipe.llm_recipe`, locates main CallGraphOp, and tags graph inputs/outputs with `qnn_graph_inputs` / `qnn_graph_outputs`.
Quant-recipe pattern framework (AOT visitor) `mllm/backends/qnn/aot/visitor/Base.hpp`, `mllm/backends/qnn/aot/visitor/Elewise.hpp`, `mllm/backends/qnn/aot/visitor/Elewise.cpp`	Added `QnnAOTQuantRecipeBasePattern` base class and `QnnAOTAddQuantRecipePattern` with `isMatch` (AddOp with `using_qnn`) and placeholder `rewrite`.
Namespace reorganization `mllm/backends/qnn/aot/passes/OpNamingPass.hpp`, `mllm/backends/qnn/aot/passes/OpNamingPass.cpp`	Moved OpNamingPass into `mllm::qnn::aot` namespace; no functional changes.
Quantization spec UUIDs & dump `mllm/compile/ir/linalg/Attribute.hpp`, `mllm/compile/ir/linalg/Attribute.cpp`	Added `QuantizationSpecUUIDGiver` singleton and `uuid` field on QuantizationSpec; assign UUIDs in spec creators; implemented `LinalgIRQuantizatonAnnotationAttr::dump(IRPrinter&)` to print specs with UUIDs.
IR / Tensor adjustments `mllm/compile/ir/tensor/Value.cpp`, `mllm/core/Tensor.cpp`	Removed emission of `"constant"` attribute from `TensorValue::dump()`; extended `Tensor::equal(float)` to support `kUInt16` RHS and improved error messaging.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor CLI as Compiler frontend
  participant AOT as AOTPipeline
  participant Pass as MarkTensorIOPass
  participant IR as IR (Module/CallGraphOp)

  CLI->>AOT: invoke lowering
  AOT->>Pass: run createMarkTensorIOPass()
  Pass->>IR: locate ModuleOp → CallGraphOp
  alt llm_recipe enabled
    Pass->>IR: mark inputs as qnn_graph_inputs
    Pass->>IR: mark outputs as qnn_graph_outputs
  else llm_recipe disabled
    Pass-->>AOT: emit config error
  end
  Pass-->>AOT: pass completed
  AOT-->>CLI: continue lowering

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat(Qnn AOT): AOT and AOT Runtime. Qwen3 AOT Mode. #567: Overlaps changes to Qwen3 AOT example files and AOT pipeline/MarkTensorIO integration.
feat: add kai&qnn-vl&opencl #489: Related work on QNN AOT backend and example workflows; may touch similar visitors/passes.

Suggested reviewers

liang1232018
yirongjie
oreomaker

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding the MarkTensorIO pass to the QNN AOT pipeline with related supporting changes.
Description check	✅ Passed	The description provides a comprehensive bullet-point summary of all changes made, covering the main MarkTensorIO pass addition and supporting modifications across multiple files.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

mllm/compile/ir/linalg/Attribute.hpp (2)
243-247: Missing UUID assignment in QuantizationSpecAsymPerBlock::create().

All other QuantizationSpec subclass create() methods assign a UUID, but this one does not. This inconsistency will result in uninitialized or zero UUIDs, breaking uniqueness guarantees.
🔎 Proposed fix
   static inline ptr_t create() {
     auto spec = std::make_shared<QuantizationSpecAsymPerBlock>();
     spec->type = QuantizationSpecType::kAsymPerBlock;
+    spec->uuid = QuantizationSpecUUIDGiver::getInstance().getUUID();
     return spec;
   }
278-282: Missing UUID assignment in QuantizationSpecLPBQ::create().

Similar to QuantizationSpecAsymPerBlock, this create() method does not assign a UUID, creating an inconsistency with other subclasses.
🔎 Proposed fix
   static inline ptr_t create() {
     auto spec = std::make_shared<QuantizationSpecLPBQ>();
     spec->type = QuantizationSpecType::kLPBQ;
+    spec->uuid = QuantizationSpecUUIDGiver::getInstance().getUUID();
     return spec;
   }

🧹 Nitpick comments (4)

mllm/compile/ir/linalg/Attribute.hpp (1)
58-62: Consider initializing struct fields for safety.

The type and uuid fields are uninitialized in a default-constructed QuantizationSpec. While subclasses use factory methods that set these fields, adding default initialization improves safety and eliminates the static analysis warning.
🔎 Proposed initialization
 struct QuantizationSpec {
   using ptr_t = std::shared_ptr<QuantizationSpec>;
-  QuantizationSpecType type;
-  uint64_t uuid;
+  QuantizationSpecType type = QuantizationSpecType::kNone;
+  uint64_t uuid = 0;
 };
mllm/compile/ir/linalg/Attribute.cpp (1)
44-64: Improve comma logic to handle empty collections.

The current implementation adds unconditional commas at lines 50 and 57, which can produce malformed output (leading commas or double commas) when inputs, outputs, or weights are empty. Consider tracking whether any content was printed and adding commas conditionally.
🔎 Example approach with conditional comma logic
+  bool first = true;
   for (int i = 0; i < annotation_.inputs.size(); ++i) {
+    if (!first) { p.comma(); }
     p.print("inputs_" + std::to_string(i));
     p.colon();
     p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
-    if (i < annotation_.inputs.size() - 1) { p.comma(); }
+    first = false;
   }
-  p.comma();
   for (int i = 0; i < annotation_.outputs.size(); ++i) {
+    if (!first) { p.comma(); }
     p.print("outputs_" + std::to_string(i));
     p.colon();
     p.print("{}", gen_quant_spec_str(annotation_.outputs[i]));
-    if (i < annotation_.outputs.size() - 1) { p.comma(); }
+    first = false;
   }
-  p.comma();
   // Similar pattern for weights...
mllm/backends/qnn/aot/visitor/Elewise.cpp (1)

38-41: TODO: Implement the rewrite logic.

The rewrite method is a placeholder that returns true without performing any transformation. If this pattern is invoked during the AOT lowering pipeline, it will incorrectly signal success despite doing nothing.

Do you want me to help generate the implementation skeleton for this method, or would you prefer to open an issue to track this task?
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (1)
38-65: Consider extracting the configuration example.

The 28-line inline JSON configuration example reduces readability. Consider moving it to:

A dedicated configuration documentation file

The file header comment

Referenced external documentation

This would improve code maintainability while preserving the valuable example.
🔎 Proposed refactor
-  // Visit all graphs and assign names to unnamed operations
-  //   {
-  //     "target_machine": {
-  //         "htp_arch": "V81",
-  //         "htp_chipset": "SM8850",
-  //         "htp_try_best_performance": "HtpBurst",
-  //         "htp_security_pd_session": "HtpSignedPd",
-  //         "htp_vtcm_capability_in_mb": 8
-  //     },
-  //     "graph_on_qnn": [
-  //         "model"
-  //     ],
-  //     "op_on_qnn": [
-  //         "lm_head"
-  //     ],
-  //     "quant_recipe": {
-  //         "llm_recipe": true,
-  //         "builtin_qwen3_recipe": {
-  //             "linear": "w4a16-lpbq",
-  //             "kv_cache": {
-  //                 "key": "int8-per-tensor",
-  //                 "value": "int8-per-tensor"
-  //             }
-  //         }
-  //     }
-  // }
-  //
-  // "llm_recipe": true must be setted!
+  // Validate llm_recipe configuration
+  // See examples/qwen3_qnn_aot/qnn_aot_cfg.json for configuration reference
+  // NOTE: "llm_recipe": true must be set!

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b414f6 and ac94c95.

📒 Files selected for processing (14)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
examples/qwen3_qnn_aot/qnn_aot_cfg.json
examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp
mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/backends/qnn/aot/passes/OpNamingPass.hpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/compile/ir/linalg/Attribute.hpp
mllm/compile/ir/tensor/Value.cpp

💤 Files with no reviewable changes (1)

mllm/compile/ir/tensor/Value.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/passes/OpNamingPass.hpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/compile/ir/linalg/Attribute.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/passes/OpNamingPass.hpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/compile/ir/linalg/Attribute.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/passes/OpNamingPass.hpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/compile/ir/linalg/Attribute.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp

🧬 Code graph analysis (4)

mllm/compile/ir/linalg/Attribute.cpp (2)

mllm/compile/ir/tensor/Value.cpp (2)

dump (39-75)

dump (39-39)

mllm/compile/ir/linalg/Attribute.hpp (1)

p (303-303)

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

MarkQnnGraphPass (9-22)

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (2)

mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)

run (65-86)

run (65-65)

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)

op (17-17)

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (2)

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (2)

createMarkTensorIOPass (114-114)

createMarkTensorIOPass (114-114)

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)

createMarkTensorIOPass (20-20)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/visitor/Elewise.hpp

[error] 27-27: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp

[error] 11-11: class 'MarkTensorIOPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Base.hpp

[error] 30-30: method 'isMatch' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 30-30: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 32-32: method 'rewrite' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

mllm/compile/ir/linalg/Attribute.hpp

[error] 41-41: class 'QuantizationSpecUUIDGiver' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 58-58: constructor does not initialize these fields: type, uuid

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 303-303: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-x86
GitHub Check: build-macos
GitHub Check: build-android

🔇 Additional comments (17)

mllm/backends/qnn/aot/passes/OpNamingPass.cpp (1)

15-15: LGTM! Namespace change is consistent with the header.

The namespace relocation from mllm::qnn to mllm::qnn::aot is correctly applied in the implementation file, with the closing comment properly updated. The functional logic remains unchanged, and the change is consistent with the corresponding header file.

Also applies to: 90-90

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

9-9: Namespace relocation is correct and all references are properly updated.

The namespace change from mllm::qnn to mllm::qnn::aot is syntactically correct, the closing comment is properly updated, and all usages of OpNamingPass are already referencing the new namespace. No orphaned references to the old namespace exist.

mllm/compile/ir/linalg/Attribute.cpp (2)

3-3: LGTM!

The <sstream> include is necessary for the std::stringstream used in the dump() implementation.

23-40: LGTM!

The lambda correctly generates a string representation of QuantizationSpec with type name and UUID.

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)

179-179: LGTM! Symmetric quantization for value_states.

The quantization of value_states to kInt8PerTensorSym correctly mirrors the key_states quantization pattern and aligns with the KV cache quantization configuration ("value": "int8-per-tensor").

mllm/backends/qnn/aot/visitor/Base.hpp (1)

26-33: LGTM! Base pattern class follows established design.

The QnnAOTQuantRecipeBasePattern correctly provides default no-op implementations for isMatch and rewrite, following the same structure as QnnAOTBasePattern. The static analysis hints suggesting to make methods static or rename the parameter are false positives—methods must remain virtual for polymorphic pattern matching.

mllm/backends/qnn/aot/visitor/Elewise.hpp (1)

25-34: LGTM! Pattern class declaration follows established conventions.

The QnnAOTAddQuantRecipePattern class correctly derives from QnnAOTQuantRecipeBasePattern and provides a factory method for pattern registration. The structure is consistent with the existing QnnAOTAddPattern design.

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)

4-4: LGTM! Clean integration of MarkTensorIO pass.

The MarkTensorIOPass is appropriately added to the pipeline after graph marking and operation naming passes, ensuring that prerequisite transformations are complete before IO tagging.

Also applies to: 17-17

examples/qwen3_qnn_aot/qnn_aot_cfg.json (1)

16-16: LGTM! Configuration enables MarkTensorIO pass.

The llm_recipe flag is correctly added and will satisfy the validation requirement in MarkTensorIOPass::run (MarkTensorIO.cpp lines 68-73).

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)

11-18: LGTM! Pass declaration follows established pattern.

The MarkTensorIOPass class structure correctly mirrors MarkQnnGraphPass (from relevant code snippets). The static analysis hint about special member functions is a false positive—pass classes are managed via shared_ptr and should not be copied or moved directly.

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (6)

18-26: LGTM! Standard pass initialization.

The function correctly retrieves the AOT configuration, validates the top-level operation type, and creates an IRWriter for graph traversal, following the established pattern from other passes.

28-36: LGTM! CallGraphOp discovery follows established pattern.

The walker correctly locates the main CallGraphOp and asserts uniqueness, matching the pattern used in OpNamingPass::run (from relevant snippets).

66-73: LGTM! Comprehensive configuration validation.

The validation logic thoroughly checks for the required llm_recipe flag with clear error messaging. The multi-stage validation (exists, is boolean, is true) handles edge cases gracefully.

75-86: LGTM! Input tagging logic correctly enforces graph boundaries.

The code properly identifies and tags graph inputs, with the prevOp() check ensuring that only true graph boundary inputs (not internal derived values) are tagged.

114-114: LGTM! Factory function follows standard pattern.

The factory function correctly instantiates and returns the pass as a shared_ptr.

88-108: Verify LinearOp identity validation in the lm_head special case.

The output tagging includes special handling for the lm_head scenario (lines 93-100), where a graph output feeds both a ReturnOp and a LinearOp. The code correctly assumes the output order based on IR construction (in Module.cpp, the LinearOp outputs are connected before ReturnOp is created), which the comments confirm.

However, the code lacks validation that the LinearOp is specifically the "lm_head" operation. Currently, it tags any LinearOp output as qnn_graph_outputs if the pattern matches, without verifying the operation's name or symbol. Consider:

Adding a check for the LinearOp's name/symbol to ensure it is indeed "lm_head"

Documenting why matching any LinearOp is sufficient, or

Refining the pattern to be more specific if needed

The fallback case (line 102) handles other graph topologies appropriately.

mllm/backends/qnn/aot/visitor/Elewise.cpp (1)

34-36: The identical isMatch implementations appear intentional but lack discriminating logic if both patterns are used together.

Both QnnAOTAddPattern::isMatch (lines 10-12) and QnnAOTAddQuantRecipePattern::isMatch (lines 34-36) are identical, matching AddOp operations with the "using_qnn" attribute. While the patterns inherit from separate base classes (QnnAOTBasePattern vs QnnAOTQuantRecipeBasePattern) with different virtual methods—addNode() vs rewrite()—the identical matching logic means both would match the same operations if registered in the same pattern pass. Currently, these patterns are not instantiated or used in any pass; however, when they are integrated, clarify whether:

Only one pattern should match these operations (add discriminating logic)

Both patterns intentionally process the same operations in sequence (document the execution order)

Different pattern passes will use these separately (document which pass uses which pattern)

coderabbitai · 2025-12-23T10:05:02Z

+  for (int i = 0; i < annotation_.outputs.size(); ++i) {
+    p.print("outputs_" + std::to_string(i));
+    p.colon();
+    p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
+    if (i < annotation_.inputs.size() - 1) { p.comma(); }
+  }


⚠️ Potential issue | 🔴 Critical

Critical bug: outputs section prints inputs data.

Lines 54-55 reference annotation_.inputs[i] and annotation_.inputs.size() instead of the corresponding outputs members. This causes the outputs section to print input data incorrectly.

🔎 Proposed fix

for (int i = 0; i < annotation_.outputs.size(); ++i) { p.print("outputs_" + std::to_string(i)); p.colon(); - p.print("{}", gen_quant_spec_str(annotation_.inputs[i])); - if (i < annotation_.inputs.size() - 1) { p.comma(); } + p.print("{}", gen_quant_spec_str(annotation_.outputs[i])); + if (i < annotation_.outputs.size() - 1) { p.comma(); } }

🤖 Prompt for AI Agents

In mllm/compile/ir/linalg/Attribute.cpp around lines 51 to 56, the outputs loop incorrectly references annotation_.inputs when generating output strings and when checking the loop-end; change the gen_quant_spec_str call to use annotation_.outputs[i] and update the size check to annotation_.outputs.size() (and any related index/size usages in that block) so the outputs section prints the correct output data and comma logic is based on outputs.

coderabbitai · 2025-12-23T10:05:02Z

+  for (int i = 0; i < annotation_.weights.size(); ++i) {
+    p.print("weight_" + std::to_string(i));
+    p.colon();
+    p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
+    if (i < annotation_.inputs.size() - 1) { p.comma(); }
+  }


⚠️ Potential issue | 🔴 Critical

Critical bug: incorrect iteration over unordered_map.

The annotation_.weights member is an unordered_map<string, QuantizationSpec::ptr_t>, not a vector. You cannot iterate it with integer indices. Additionally, line 61 incorrectly references annotation_.inputs[i] instead of the weight value, and line 62 checks the wrong size.

🔎 Proposed fix using range-based for loop

- for (int i = 0; i < annotation_.weights.size(); ++i) { - p.print("weight_" + std::to_string(i)); + int weight_idx = 0; + for (const auto& [name, spec] : annotation_.weights) { + p.print("weight_" + name); p.colon(); - p.print("{}", gen_quant_spec_str(annotation_.inputs[i])); - if (i < annotation_.inputs.size() - 1) { p.comma(); } + p.print("{}", gen_quant_spec_str(spec)); + if (weight_idx < annotation_.weights.size() - 1) { p.comma(); } + weight_idx++; }

🤖 Prompt for AI Agents

In mllm/compile/ir/linalg/Attribute.cpp around lines 58 to 63, the code incorrectly treats annotation_.weights (an unordered_map<string, QuantizationSpec::ptr_t>) like a vector: it indexes by integer, references annotation_.inputs[i], and compares to annotation_.inputs.size(); change to iterate the unordered_map with a range-based loop (for (const auto& kv : annotation_.weights) or using iterator), use kv.first for the weight name and kv.second as the QuantizationSpec passed to gen_quant_spec_str, and determine comma placement by either tracking an index/counter versus annotation_.weights.size() or by using an iterator and checking std::next(it) != end to only append commas between elements.

coderabbitai · 2025-12-23T10:05:02Z

+class QuantizationSpecUUIDGiver {
+ public:
+  static QuantizationSpecUUIDGiver& getInstance() {
+    static QuantizationSpecUUIDGiver instance;
+    return instance;
+  }
+
+  uint64_t getUUID() { return next_uuid_++; }
+
+ private:
+  QuantizationSpecUUIDGiver() = default;
+  QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete;             // NOLINT
+  QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete;  // NOLINT
+
+  uint64_t next_uuid_ = 0;
+};


⚠️ Potential issue | 🔴 Critical

Thread-safety issue: getUUID() lacks synchronization.

The next_uuid_++ operation in getUUID() is not atomic and lacks synchronization. If multiple threads call this method concurrently, it can result in race conditions, duplicate UUIDs, or lost increments.

🔎 Proposed fix using std::atomic

+#include <atomic> + class QuantizationSpecUUIDGiver { public: static QuantizationSpecUUIDGiver& getInstance() { static QuantizationSpecUUIDGiver instance; return instance; } uint64_t getUUID() { return next_uuid_++; } private: QuantizationSpecUUIDGiver() = default; QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT - uint64_t next_uuid_ = 0; + std::atomic<uint64_t> next_uuid_{0}; };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class QuantizationSpecUUIDGiver {

public:

static QuantizationSpecUUIDGiver& getInstance() {

static QuantizationSpecUUIDGiver instance;

return instance;

}

uint64_t getUUID() { return next_uuid_++; }

private:

QuantizationSpecUUIDGiver() = default;

QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT

QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT

uint64_t next_uuid_ = 0;

};

#include <atomic>

class QuantizationSpecUUIDGiver {

public:

static QuantizationSpecUUIDGiver& getInstance() {

static QuantizationSpecUUIDGiver instance;

return instance;

}

uint64_t getUUID() { return next_uuid_++; }

private:

QuantizationSpecUUIDGiver() = default;

QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT

QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT

std::atomic<uint64_t> next_uuid_{0};

};

🧰 Tools

🪛 Clang (14.0.6)

[error] 41-41: class 'QuantizationSpecUUIDGiver' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

🤖 Prompt for AI Agents

In mllm/compile/ir/linalg/Attribute.hpp around lines 41-56, the getUUID() method increments next_uuid_ without synchronization leading to race conditions; change next_uuid_ to std::atomic<uint64_t> and implement getUUID() to return next_uuid_.fetch_add(1, std::memory_order_relaxed) (or stronger ordering if required), and add the required #include <atomic> at the top of the file so concurrent calls are atomic and thread-safe.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (1)

45-71: Consider moving the large JSON example to documentation.

While the inline JSON example is helpful for understanding the expected config format, it's quite verbose (27 lines). Consider moving it to the header file documentation, a separate markdown file, or condensing it to show only the relevant parts for this pass.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac94c95 and 64e8a41.

📒 Files selected for processing (4)

examples/qwen3_qnn_aot/compile.cpp
examples/qwen3_qnn_aot/qnn_aot_cfg.json
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/core/Tensor.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

examples/qwen3_qnn_aot/qnn_aot_cfg.json

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/core/Tensor.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/core/Tensor.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/core/Tensor.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
mllm/core/Tensor.cpp

🧬 Code graph analysis (3)

examples/qwen3_qnn_aot/compile.cpp (1)

mllm/core/Tensor.cpp (2)

zeros (97-101)

zeros (97-97)

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (3)

mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)

run (65-86)

run (65-65)

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)

op (17-17)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

op (17-17)

mllm/core/Tensor.cpp (2)

mllm/core/DataTypes.cpp (2)

nameOfType (107-154)

nameOfType (107-107)

mllm/core/DataTypes.hpp (1)

nameOfType (713-713)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-android
GitHub Check: build-x86
GitHub Check: build-macos

🔇 Additional comments (6)

mllm/core/Tensor.cpp (1)

370-370: LGTM! Improved error message.

Including the actual type name in the error message using nameOfType(dtype()) significantly improves debugging experience when unsupported types are encountered.

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (5)

1-15: LGTM!

The includes are well-organized and appropriate for this pass implementation.

22-26: LGTM!

The validation ensures the pass operates under supported conditions, and the error message clearly explains the constraint.

72-79: LGTM!

The llm_recipe validation is thorough with clear, actionable error messages.

81-92: LGTM!

The input tagging logic correctly validates tensor inputs and applies the appropriate attribute.

117-120: LGTM!

The return statement and factory function follow standard patterns for pass implementation.

coderabbitai · 2025-12-23T11:11:51Z

+  auto sequence = mllm::Tensor::zeros({1, N}, mllm::kInt64);
+  auto causal_mask = mllm::Tensor::zeros({1, 1, N, CL}, mllm::kUInt16);


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the exact content of lines 59-60 cat -n examples/qwen3_qnn_aot/compile.cpp | sed -n '55,70p'

Repository: UbiquitousLearning/mllm

Length of output: 837

🏁 Script executed:

# Search for causal_mask usage patterns across the codebase rg -n 'causal_mask.*Tensor' --type=cpp -C2

Repository: UbiquitousLearning/mllm

Length of output: 16711

🏁 Script executed:

# Search for mask dtype specifications in QNN backend and AOT code rg -n 'Tensor::(zeros|empty|ones).*k(UInt|Int|Float)' --type=cpp -g 'backends/qnn/**' -C1

Repository: UbiquitousLearning/mllm

Length of output: 215

🏁 Script executed:

# Look for attention mask or causal mask related code in the entire repo rg -n 'causal.*mask' --type=cpp -g '**/*.cpp' | head -20

Repository: UbiquitousLearning/mllm

Length of output: 2644

🏁 Script executed:

# Check if kUInt16 is used elsewhere for similar purposes rg -n 'kUInt16' --type=cpp

Repository: UbiquitousLearning/mllm

Length of output: 2868

Change causal_mask dtype from kUInt16 to kFloat32 (line 60).

The causal_mask dtype is inconsistent with all other models in the codebase. Every causal_mask initialization—across LLaMA, Qwen, and custom ops—uses kFloat32. The kUInt16 choice appears to be an error and should be corrected to align with established patterns.

🤖 Prompt for AI Agents

In examples/qwen3_qnn_aot/compile.cpp around lines 59-60, the causal_mask is created with dtype mllm::kUInt16 which is inconsistent; change the tensor initialization to use mllm::kFloat32 instead so the causal_mask matches other models (LLaMA, Qwen, custom ops) and maintains dtype consistency across the codebase.

coderabbitai · 2025-12-23T11:11:51Z

+        return ir::IRWriter::WalkResult::WALK_CONTINUE;
+      });
+
+  // Visit all graphs and assign names to unnamed operations


⚠️ Potential issue | 🟡 Minor

Incorrect comment copied from OpNamingPass.

The comment "Visit all graphs and assign names to unnamed operations" is incorrect for this pass, which marks tensor inputs and outputs. This appears to be copy-pasted from OpNamingPass.cpp without updating.

🔎 Proposed fix

- // Visit all graphs and assign names to unnamed operations + // Mark tensor inputs and outputs for the main CallGraphOp

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/MarkTensorIO.cpp around line 44, the comment "Visit all graphs and assign names to unnamed operations" is incorrect and was copy-pasted from OpNamingPass; update the comment to accurately describe this pass's purpose (marking tensor inputs and outputs), e.g., replace the line with a concise description such as "Visit graphs and mark tensor inputs and outputs" or similar wording that reflects the pass behavior.

coderabbitai · 2025-12-23T11:11:51Z

+    // Tag the outputs to call_graph to "qnn_graph_outputs"
+    {
+      auto outputs = call_main_graph_op->outputs();
+      for (auto o : outputs) {
+        if (o->isa_<ir::tensor::TensorValue>()) {
+          // That means after this graph, we have a lm_head!
+          // o 's outputs is cf.Return and lm_head(linear op)
+          if (!o->outputs().empty() && o->outputs().size() == 2 && o->outputs().front()->isa_<ir::cf::ReturnOp>()
+              && (*std::next(o->outputs().begin()))->isa_<ir::linalg::LinearOp>()) {
+            auto lm_head_ir = (*std::next(o->outputs().begin()))->cast_<ir::linalg::LinearOp>();
+            if (lm_head_ir->outputs().front()->outputs().empty()) {
+              lm_head_ir->outputs().front()->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true));
+            }
+          } else {
+            o->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true));
+          }
+        } else {
+          MLLM_ERROR_EXIT(ExitCode::kCoreError, "Expect pure tensor value outputs to graph.CallGraphOp");
+        }
+      }
+    }


🛠️ Refactor suggestion | 🟠 Major

Clarify the lm_head special case logic.

The output tagging logic contains a complex conditional (lines 101-106) that handles a special lm_head scenario. The logic is difficult to follow and lacks explanatory comments:

The condition at lines 101-102 uses both front() and std::next() to access elements, which is not immediately clear.

The check at line 104 (lm_head_ir->outputs().front()->outputs().empty()) lacks explanation of its purpose.

The overall lm_head special case would benefit from a comment explaining why this handling is needed.

🔎 Proposed improvements

Add explanatory comments and consider using intermediate variables for clarity:

// Tag the outputs to call_graph to "qnn_graph_outputs" { auto outputs = call_main_graph_op->outputs(); for (auto o : outputs) { if (o->isa_<ir::tensor::TensorValue>()) { - // That means after this graph, we have a lm_head! - // o 's outputs is cf.Return and lm_head(linear op) + // Special case: if this graph output feeds both a ReturnOp and a lm_head LinearOp, + // mark the lm_head's output as the graph output instead (for ops on QNN) if (!o->outputs().empty() && o->outputs().size() == 2 && o->outputs().front()->isa_<ir::cf::ReturnOp>() && (*std::next(o->outputs().begin()))->isa_<ir::linalg::LinearOp>()) { auto lm_head_ir = (*std::next(o->outputs().begin()))->cast_<ir::linalg::LinearOp>(); + // Only tag lm_head output if it's the final output (no further consumers) if (lm_head_ir->outputs().front()->outputs().empty()) { lm_head_ir->outputs().front()->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true)); } } else { o->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true)); }

Or use intermediate variables for better readability:

if (!o->outputs().empty() && o->outputs().size() == 2) { + auto output_consumers = o->outputs(); + auto first_consumer = output_consumers.front(); + auto second_consumer = *std::next(output_consumers.begin()); + - if (o->outputs().front()->isa_<ir::cf::ReturnOp>() - && (*std::next(o->outputs().begin()))->isa_<ir::linalg::LinearOp>()) { + if (first_consumer->isa_<ir::cf::ReturnOp>() && second_consumer->isa_<ir::linalg::LinearOp>()) { - auto lm_head_ir = (*std::next(o->outputs().begin()))->cast_<ir::linalg::LinearOp>(); + auto lm_head_ir = second_consumer->cast_<ir::linalg::LinearOp>();

coderabbitai · 2025-12-23T11:11:51Z

      case kInt16: *(rhs_tensor.ptr<int16_t>()) = v; break;
      case kInt8: *(rhs_tensor.ptr<int8_t>()) = v; break;
-      default: NYI("Type is not supported"); break;
+      case kUInt16: *(rhs_tensor.ptr<uint16_t>()) = v; break;


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "Tensor.cpp" -type f 2>/dev/null | head -5

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# First, let's understand the file structure git ls-files | grep -i tensor | head -20

Repository: UbiquitousLearning/mllm

Length of output: 882

🏁 Script executed:

# Read the Tensor.cpp file to examine the relevant lines wc -l mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 91

🏁 Script executed:

# Look at the equal(float v) method around line 369 sed -n '357,377p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1058

🏁 Script executed:

# Look at the scalar operators (lines 148-232) sed -n '148,180p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1661

🏁 Script executed:

# Search for kUInt16 support across all scalar operator methods rg -n 'case kUInt16' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 132

🏁 Script executed:

# Check the complete pattern of type handling in scalar operators sed -n '190,232p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1982

🏁 Script executed:

rg -n 'kUInt16' mllm/ --type cpp

Repository: UbiquitousLearning/mllm

Length of output: 2758

🏁 Script executed:

# Check if there are any comments or documentation about kUInt16 support rg -n 'UInt16' mllm/ --type cpp -B2 -A2 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 4216

🏁 Script executed:

# Look at the type enum definition to understand kUInt16 fd 'DataTypes' mllm/ --type f

Repository: UbiquitousLearning/mllm

Length of output: 114

🏁 Script executed:

# Check the complete operator+ implementation to compare sed -n '148,185p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1819

*Add kUInt16 support to scalar operators (+, -, , /) to match equal() method support.

The equal(float v) method now supports kUInt16, but the scalar arithmetic operators (operator+, operator-, operator*, operator/) do not. Since kUInt16 is a supported type used throughout the codebase, users will encounter NYI("Type is not supported") runtime failures if they attempt scalar arithmetic operations on uint16 tensors. Either add kUInt16 handling to all scalar operators to match equal(), or remove it from equal() if uint16 scalar operations are intentionally unsupported.

Additionally, improve error messages in scalar operators to include the unsupported type name like equal() does: use NYI("Type is not supported {}", nameOfType(dtype())) instead of NYI("Type is not supported").

🤖 Prompt for AI Agents

In mllm/core/Tensor.cpp around line 369, the scalar arithmetic operators (+, -, *, /) lack kUInt16 handling and their NYI errors lack the dtype name; update each scalar operator's switch to add a kUInt16 case mirroring equal(float v) (cast/use uint16_t appropriately when assigning results) and replace NYI("Type is not supported") with NYI("Type is not supported {}", nameOfType(dtype())) so unsupported-type errors include the type name.

UbiquitousLearning

LGTM

chenghuaWang and others added 3 commits December 23, 2025 09:30

Merge branch 'UbiquitousLearning:main' into wch-main

d02a3bb

Merge branch 'UbiquitousLearning:main' into wch-main

ac94c95

chenghuaWang requested review from liang1232018 and oreomaker as code owners December 23, 2025 09:59

coderabbitai Bot reviewed Dec 23, 2025

View reviewed changes

fix: update quantization settings.

64e8a41

coderabbitai Bot reviewed Dec 23, 2025

View reviewed changes

UbiquitousLearning approved these changes Dec 23, 2025

View reviewed changes

chenghuaWang merged commit d4cbe15 into UbiquitousLearning:main Dec 23, 2025
4 checks passed

This was referenced Dec 24, 2025

feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization #572

Merged

feat: add LLM2QnnLoweringPass and update graph splitting logic #577

Merged

coderabbitai Bot mentioned this pull request Jan 7, 2026

feat(qualcomm): AOTPipeline update #585

Merged

coderabbitai Bot mentioned this pull request Jan 20, 2026

fix(qualcomm): Enhance quantization modules. #607

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline #569

feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline #569
chenghuaWang merged 4 commits intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main

chenghuaWang commented Dec 23, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 23, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

UbiquitousLearning left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		auto sequence = mllm::Tensor::zeros({1, N}, mllm::kInt64);
		auto causal_mask = mllm::Tensor::zeros({1, 1, N, CL}, mllm::kUInt16);

Conversation

chenghuaWang commented Dec 23, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

UbiquitousLearning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghuaWang commented Dec 23, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 23, 2025 •

edited

Loading