Skip to content

feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline #569

Merged
chenghuaWang merged 4 commits intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main
Dec 23, 2025
Merged

feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline #569
chenghuaWang merged 4 commits intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main

Conversation

@chenghuaWang
Copy link
Copy Markdown
Collaborator

@chenghuaWang chenghuaWang commented Dec 23, 2025

  • Introduced MarkTensorIO pass to tag tensor inputs and outputs in the CallGraphOp.
  • Updated AOTPipeline to include the new MarkTensorIO pass.
  • Created MarkTensorIO implementation and header files.
  • Modified OpNamingPass to ensure proper namespace usage.
  • Added base pattern class for quantization recipes and implemented matching logic for Add operations.
  • Enhanced LinalgIRQuantizationAnnotationAttr to include a dump method for better debugging.
  • Introduced UUID management for quantization specifications to ensure unique identification.
  • Cleaned up TensorValue dump method by removing unnecessary constant attribute printing.

Summary by CodeRabbit

  • New Features

    • Added LLM recipe config support and tensor I/O marking in the compilation pipeline for improved AOT handling.
  • Refactor

    • Moved compilation pass namespaces into the AOT submodule for clearer organization.
  • Chores

    • Improved quantization handling (including pre-quantization of value states) and added UUID-tracked quantization specs with richer debug printing.
    • Minor visitor/pattern scaffolding and explicit tensor datatype initializations.
  • Bug Fixes

    • Broadened tensor equality support for additional dtypes.

✏️ Tip: You can customize this high-level summary in your review settings.

chenghuaWang and others added 3 commits December 23, 2025 09:30
- Introduced MarkTensorIO pass to tag tensor inputs and outputs in the CallGraphOp.
- Updated AOTPipeline to include the new MarkTensorIO pass.
- Created MarkTensorIO implementation and header files.
- Modified OpNamingPass to ensure proper namespace usage.
- Added base pattern class for quantization recipes and implemented matching logic for Add operations.
- Enhanced LinalgIRQuantizationAnnotationAttr to include a dump method for better debugging.
- Introduced UUID management for quantization specifications to ensure unique identification.
- Cleaned up TensorValue dump method by removing unnecessary constant attribute printing.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 23, 2025

Walkthrough

Adds a MarkTensorIOPass to the QNN AOT lowering pipeline, introduces a quant-recipe pattern framework and an Add quant-recipe pattern, adds UUID tracking and textual dump for quantization specs, adjusts example KV-cache quantization for value_states, and renames a few namespaces for AOT consistency.

Changes

Cohort / File(s) Summary
Qwen3 AOT example
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp, examples/qwen3_qnn_aot/qnn_aot_cfg.json, examples/qwen3_qnn_aot/compile.cpp
value_states now quantized to kInt8PerTensorSym before KV concatenation; example tensors initialized with explicit dtypes; example config enables "split_graph": 1 and "quant_recipe": {"llm_recipe": true}.
MarkTensorIO pass & pipeline integration
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp, mllm/backends/qnn/aot/passes/MarkTensorIO.cpp, mllm/backends/qnn/aot/passes/AOTPipeline.cpp
New MarkTensorIOPass added and injected into AOT lowering pipeline; validates quant_recipe.llm_recipe, locates main CallGraphOp, and tags graph inputs/outputs with qnn_graph_inputs / qnn_graph_outputs.
Quant-recipe pattern framework (AOT visitor)
mllm/backends/qnn/aot/visitor/Base.hpp, mllm/backends/qnn/aot/visitor/Elewise.hpp, mllm/backends/qnn/aot/visitor/Elewise.cpp
Added QnnAOTQuantRecipeBasePattern base class and QnnAOTAddQuantRecipePattern with isMatch (AddOp with using_qnn) and placeholder rewrite.
Namespace reorganization
mllm/backends/qnn/aot/passes/OpNamingPass.hpp, mllm/backends/qnn/aot/passes/OpNamingPass.cpp
Moved OpNamingPass into mllm::qnn::aot namespace; no functional changes.
Quantization spec UUIDs & dump
mllm/compile/ir/linalg/Attribute.hpp, mllm/compile/ir/linalg/Attribute.cpp
Added QuantizationSpecUUIDGiver singleton and uuid field on QuantizationSpec; assign UUIDs in spec creators; implemented LinalgIRQuantizatonAnnotationAttr::dump(IRPrinter&) to print specs with UUIDs.
IR / Tensor adjustments
mllm/compile/ir/tensor/Value.cpp, mllm/core/Tensor.cpp
Removed emission of "constant" attribute from TensorValue::dump(); extended Tensor::equal(float) to support kUInt16 RHS and improved error messaging.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor CLI as Compiler frontend
  participant AOT as AOTPipeline
  participant Pass as MarkTensorIOPass
  participant IR as IR (Module/CallGraphOp)

  CLI->>AOT: invoke lowering
  AOT->>Pass: run createMarkTensorIOPass()
  Pass->>IR: locate ModuleOp → CallGraphOp
  alt llm_recipe enabled
    Pass->>IR: mark inputs as qnn_graph_inputs
    Pass->>IR: mark outputs as qnn_graph_outputs
  else llm_recipe disabled
    Pass-->>AOT: emit config error
  end
  Pass-->>AOT: pass completed
  AOT-->>CLI: continue lowering
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • liang1232018
  • yirongjie
  • oreomaker

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding the MarkTensorIO pass to the QNN AOT pipeline with related supporting changes.
Description check ✅ Passed The description provides a comprehensive bullet-point summary of all changes made, covering the main MarkTensorIO pass addition and supporting modifications across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
mllm/compile/ir/linalg/Attribute.hpp (2)

243-247: Missing UUID assignment in QuantizationSpecAsymPerBlock::create().

All other QuantizationSpec subclass create() methods assign a UUID, but this one does not. This inconsistency will result in uninitialized or zero UUIDs, breaking uniqueness guarantees.

🔎 Proposed fix
   static inline ptr_t create() {
     auto spec = std::make_shared<QuantizationSpecAsymPerBlock>();
     spec->type = QuantizationSpecType::kAsymPerBlock;
+    spec->uuid = QuantizationSpecUUIDGiver::getInstance().getUUID();
     return spec;
   }

278-282: Missing UUID assignment in QuantizationSpecLPBQ::create().

Similar to QuantizationSpecAsymPerBlock, this create() method does not assign a UUID, creating an inconsistency with other subclasses.

🔎 Proposed fix
   static inline ptr_t create() {
     auto spec = std::make_shared<QuantizationSpecLPBQ>();
     spec->type = QuantizationSpecType::kLPBQ;
+    spec->uuid = QuantizationSpecUUIDGiver::getInstance().getUUID();
     return spec;
   }
🧹 Nitpick comments (4)
mllm/compile/ir/linalg/Attribute.hpp (1)

58-62: Consider initializing struct fields for safety.

The type and uuid fields are uninitialized in a default-constructed QuantizationSpec. While subclasses use factory methods that set these fields, adding default initialization improves safety and eliminates the static analysis warning.

🔎 Proposed initialization
 struct QuantizationSpec {
   using ptr_t = std::shared_ptr<QuantizationSpec>;
-  QuantizationSpecType type;
-  uint64_t uuid;
+  QuantizationSpecType type = QuantizationSpecType::kNone;
+  uint64_t uuid = 0;
 };
mllm/compile/ir/linalg/Attribute.cpp (1)

44-64: Improve comma logic to handle empty collections.

The current implementation adds unconditional commas at lines 50 and 57, which can produce malformed output (leading commas or double commas) when inputs, outputs, or weights are empty. Consider tracking whether any content was printed and adding commas conditionally.

🔎 Example approach with conditional comma logic
+  bool first = true;
   for (int i = 0; i < annotation_.inputs.size(); ++i) {
+    if (!first) { p.comma(); }
     p.print("inputs_" + std::to_string(i));
     p.colon();
     p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
-    if (i < annotation_.inputs.size() - 1) { p.comma(); }
+    first = false;
   }
-  p.comma();
   for (int i = 0; i < annotation_.outputs.size(); ++i) {
+    if (!first) { p.comma(); }
     p.print("outputs_" + std::to_string(i));
     p.colon();
     p.print("{}", gen_quant_spec_str(annotation_.outputs[i]));
-    if (i < annotation_.outputs.size() - 1) { p.comma(); }
+    first = false;
   }
-  p.comma();
   // Similar pattern for weights...
mllm/backends/qnn/aot/visitor/Elewise.cpp (1)

38-41: TODO: Implement the rewrite logic.

The rewrite method is a placeholder that returns true without performing any transformation. If this pattern is invoked during the AOT lowering pipeline, it will incorrectly signal success despite doing nothing.

Do you want me to help generate the implementation skeleton for this method, or would you prefer to open an issue to track this task?

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (1)

38-65: Consider extracting the configuration example.

The 28-line inline JSON configuration example reduces readability. Consider moving it to:

  • A dedicated configuration documentation file
  • The file header comment
  • Referenced external documentation

This would improve code maintainability while preserving the valuable example.

🔎 Proposed refactor
-  // Visit all graphs and assign names to unnamed operations
-  //   {
-  //     "target_machine": {
-  //         "htp_arch": "V81",
-  //         "htp_chipset": "SM8850",
-  //         "htp_try_best_performance": "HtpBurst",
-  //         "htp_security_pd_session": "HtpSignedPd",
-  //         "htp_vtcm_capability_in_mb": 8
-  //     },
-  //     "graph_on_qnn": [
-  //         "model"
-  //     ],
-  //     "op_on_qnn": [
-  //         "lm_head"
-  //     ],
-  //     "quant_recipe": {
-  //         "llm_recipe": true,
-  //         "builtin_qwen3_recipe": {
-  //             "linear": "w4a16-lpbq",
-  //             "kv_cache": {
-  //                 "key": "int8-per-tensor",
-  //                 "value": "int8-per-tensor"
-  //             }
-  //         }
-  //     }
-  // }
-  //
-  // "llm_recipe": true must be setted!
+  // Validate llm_recipe configuration
+  // See examples/qwen3_qnn_aot/qnn_aot_cfg.json for configuration reference
+  // NOTE: "llm_recipe": true must be set!
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b414f6 and ac94c95.

📒 Files selected for processing (14)
  • examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
  • examples/qwen3_qnn_aot/qnn_aot_cfg.json
  • examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/backends/qnn/aot/passes/MarkTensorIO.hpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.hpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/compile/ir/linalg/Attribute.hpp
  • mllm/compile/ir/tensor/Value.cpp
💤 Files with no reviewable changes (1)
  • mllm/compile/ir/tensor/Value.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/passes/OpNamingPass.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/MarkTensorIO.hpp
  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/passes/OpNamingPass.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/MarkTensorIO.hpp
  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/passes/OpNamingPass.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/MarkTensorIO.hpp
  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
🧬 Code graph analysis (4)
mllm/compile/ir/linalg/Attribute.cpp (2)
mllm/compile/ir/tensor/Value.cpp (2)
  • dump (39-75)
  • dump (39-39)
mllm/compile/ir/linalg/Attribute.hpp (1)
  • p (303-303)
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • MarkQnnGraphPass (9-22)
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (2)
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)
  • run (65-86)
  • run (65-65)
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (2)
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (2)
  • createMarkTensorIOPass (114-114)
  • createMarkTensorIOPass (114-114)
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
  • createMarkTensorIOPass (20-20)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/visitor/Elewise.hpp

[error] 27-27: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp

[error] 11-11: class 'MarkTensorIOPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Base.hpp

[error] 30-30: method 'isMatch' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 30-30: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 32-32: method 'rewrite' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

mllm/compile/ir/linalg/Attribute.hpp

[error] 41-41: class 'QuantizationSpecUUIDGiver' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 58-58: constructor does not initialize these fields: type, uuid

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 303-303: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-x86
  • GitHub Check: build-macos
  • GitHub Check: build-android
🔇 Additional comments (17)
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (1)

15-15: LGTM! Namespace change is consistent with the header.

The namespace relocation from mllm::qnn to mllm::qnn::aot is correctly applied in the implementation file, with the closing comment properly updated. The functional logic remains unchanged, and the change is consistent with the corresponding header file.

Also applies to: 90-90

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

9-9: Namespace relocation is correct and all references are properly updated.

The namespace change from mllm::qnn to mllm::qnn::aot is syntactically correct, the closing comment is properly updated, and all usages of OpNamingPass are already referencing the new namespace. No orphaned references to the old namespace exist.

mllm/compile/ir/linalg/Attribute.cpp (2)

3-3: LGTM!

The <sstream> include is necessary for the std::stringstream used in the dump() implementation.


23-40: LGTM!

The lambda correctly generates a string representation of QuantizationSpec with type name and UUID.

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)

179-179: LGTM! Symmetric quantization for value_states.

The quantization of value_states to kInt8PerTensorSym correctly mirrors the key_states quantization pattern and aligns with the KV cache quantization configuration ("value": "int8-per-tensor").

mllm/backends/qnn/aot/visitor/Base.hpp (1)

26-33: LGTM! Base pattern class follows established design.

The QnnAOTQuantRecipeBasePattern correctly provides default no-op implementations for isMatch and rewrite, following the same structure as QnnAOTBasePattern. The static analysis hints suggesting to make methods static or rename the parameter are false positives—methods must remain virtual for polymorphic pattern matching.

mllm/backends/qnn/aot/visitor/Elewise.hpp (1)

25-34: LGTM! Pattern class declaration follows established conventions.

The QnnAOTAddQuantRecipePattern class correctly derives from QnnAOTQuantRecipeBasePattern and provides a factory method for pattern registration. The structure is consistent with the existing QnnAOTAddPattern design.

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)

4-4: LGTM! Clean integration of MarkTensorIO pass.

The MarkTensorIOPass is appropriately added to the pipeline after graph marking and operation naming passes, ensuring that prerequisite transformations are complete before IO tagging.

Also applies to: 17-17

examples/qwen3_qnn_aot/qnn_aot_cfg.json (1)

16-16: LGTM! Configuration enables MarkTensorIO pass.

The llm_recipe flag is correctly added and will satisfy the validation requirement in MarkTensorIOPass::run (MarkTensorIO.cpp lines 68-73).

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)

11-18: LGTM! Pass declaration follows established pattern.

The MarkTensorIOPass class structure correctly mirrors MarkQnnGraphPass (from relevant code snippets). The static analysis hint about special member functions is a false positive—pass classes are managed via shared_ptr and should not be copied or moved directly.

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (6)

18-26: LGTM! Standard pass initialization.

The function correctly retrieves the AOT configuration, validates the top-level operation type, and creates an IRWriter for graph traversal, following the established pattern from other passes.


28-36: LGTM! CallGraphOp discovery follows established pattern.

The walker correctly locates the main CallGraphOp and asserts uniqueness, matching the pattern used in OpNamingPass::run (from relevant snippets).


66-73: LGTM! Comprehensive configuration validation.

The validation logic thoroughly checks for the required llm_recipe flag with clear error messaging. The multi-stage validation (exists, is boolean, is true) handles edge cases gracefully.


75-86: LGTM! Input tagging logic correctly enforces graph boundaries.

The code properly identifies and tags graph inputs, with the prevOp() check ensuring that only true graph boundary inputs (not internal derived values) are tagged.


114-114: LGTM! Factory function follows standard pattern.

The factory function correctly instantiates and returns the pass as a shared_ptr.


88-108: Verify LinearOp identity validation in the lm_head special case.

The output tagging includes special handling for the lm_head scenario (lines 93-100), where a graph output feeds both a ReturnOp and a LinearOp. The code correctly assumes the output order based on IR construction (in Module.cpp, the LinearOp outputs are connected before ReturnOp is created), which the comments confirm.

However, the code lacks validation that the LinearOp is specifically the "lm_head" operation. Currently, it tags any LinearOp output as qnn_graph_outputs if the pattern matches, without verifying the operation's name or symbol. Consider:

  • Adding a check for the LinearOp's name/symbol to ensure it is indeed "lm_head"
  • Documenting why matching any LinearOp is sufficient, or
  • Refining the pattern to be more specific if needed

The fallback case (line 102) handles other graph topologies appropriately.

mllm/backends/qnn/aot/visitor/Elewise.cpp (1)

34-36: The identical isMatch implementations appear intentional but lack discriminating logic if both patterns are used together.

Both QnnAOTAddPattern::isMatch (lines 10-12) and QnnAOTAddQuantRecipePattern::isMatch (lines 34-36) are identical, matching AddOp operations with the "using_qnn" attribute. While the patterns inherit from separate base classes (QnnAOTBasePattern vs QnnAOTQuantRecipeBasePattern) with different virtual methods—addNode() vs rewrite()—the identical matching logic means both would match the same operations if registered in the same pattern pass. Currently, these patterns are not instantiated or used in any pass; however, when they are integrated, clarify whether:

  • Only one pattern should match these operations (add discriminating logic)
  • Both patterns intentionally process the same operations in sequence (document the execution order)
  • Different pattern passes will use these separately (document which pass uses which pattern)

Comment on lines +51 to +56
for (int i = 0; i < annotation_.outputs.size(); ++i) {
p.print("outputs_" + std::to_string(i));
p.colon();
p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
if (i < annotation_.inputs.size() - 1) { p.comma(); }
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical bug: outputs section prints inputs data.

Lines 54-55 reference annotation_.inputs[i] and annotation_.inputs.size() instead of the corresponding outputs members. This causes the outputs section to print input data incorrectly.

🔎 Proposed fix
   for (int i = 0; i < annotation_.outputs.size(); ++i) {
     p.print("outputs_" + std::to_string(i));
     p.colon();
-    p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
-    if (i < annotation_.inputs.size() - 1) { p.comma(); }
+    p.print("{}", gen_quant_spec_str(annotation_.outputs[i]));
+    if (i < annotation_.outputs.size() - 1) { p.comma(); }
   }
🤖 Prompt for AI Agents
In mllm/compile/ir/linalg/Attribute.cpp around lines 51 to 56, the outputs loop
incorrectly references annotation_.inputs when generating output strings and
when checking the loop-end; change the gen_quant_spec_str call to use
annotation_.outputs[i] and update the size check to annotation_.outputs.size()
(and any related index/size usages in that block) so the outputs section prints
the correct output data and comma logic is based on outputs.

Comment on lines +58 to +63
for (int i = 0; i < annotation_.weights.size(); ++i) {
p.print("weight_" + std::to_string(i));
p.colon();
p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
if (i < annotation_.inputs.size() - 1) { p.comma(); }
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical bug: incorrect iteration over unordered_map.

The annotation_.weights member is an unordered_map<string, QuantizationSpec::ptr_t>, not a vector. You cannot iterate it with integer indices. Additionally, line 61 incorrectly references annotation_.inputs[i] instead of the weight value, and line 62 checks the wrong size.

🔎 Proposed fix using range-based for loop
-  for (int i = 0; i < annotation_.weights.size(); ++i) {
-    p.print("weight_" + std::to_string(i));
+  int weight_idx = 0;
+  for (const auto& [name, spec] : annotation_.weights) {
+    p.print("weight_" + name);
     p.colon();
-    p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
-    if (i < annotation_.inputs.size() - 1) { p.comma(); }
+    p.print("{}", gen_quant_spec_str(spec));
+    if (weight_idx < annotation_.weights.size() - 1) { p.comma(); }
+    weight_idx++;
   }
🤖 Prompt for AI Agents
In mllm/compile/ir/linalg/Attribute.cpp around lines 58 to 63, the code
incorrectly treats annotation_.weights (an unordered_map<string,
QuantizationSpec::ptr_t>) like a vector: it indexes by integer, references
annotation_.inputs[i], and compares to annotation_.inputs.size(); change to
iterate the unordered_map with a range-based loop (for (const auto& kv :
annotation_.weights) or using iterator), use kv.first for the weight name and
kv.second as the QuantizationSpec passed to gen_quant_spec_str, and determine
comma placement by either tracking an index/counter versus
annotation_.weights.size() or by using an iterator and checking std::next(it) !=
end to only append commas between elements.

Comment on lines +41 to +56
class QuantizationSpecUUIDGiver {
public:
static QuantizationSpecUUIDGiver& getInstance() {
static QuantizationSpecUUIDGiver instance;
return instance;
}

uint64_t getUUID() { return next_uuid_++; }

private:
QuantizationSpecUUIDGiver() = default;
QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT
QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT

uint64_t next_uuid_ = 0;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Thread-safety issue: getUUID() lacks synchronization.

The next_uuid_++ operation in getUUID() is not atomic and lacks synchronization. If multiple threads call this method concurrently, it can result in race conditions, duplicate UUIDs, or lost increments.

🔎 Proposed fix using std::atomic
+#include <atomic>
+
 class QuantizationSpecUUIDGiver {
  public:
   static QuantizationSpecUUIDGiver& getInstance() {
     static QuantizationSpecUUIDGiver instance;
     return instance;
   }
 
   uint64_t getUUID() { return next_uuid_++; }
 
  private:
   QuantizationSpecUUIDGiver() = default;
   QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete;             // NOLINT
   QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete;  // NOLINT
 
-  uint64_t next_uuid_ = 0;
+  std::atomic<uint64_t> next_uuid_{0};
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class QuantizationSpecUUIDGiver {
public:
static QuantizationSpecUUIDGiver& getInstance() {
static QuantizationSpecUUIDGiver instance;
return instance;
}
uint64_t getUUID() { return next_uuid_++; }
private:
QuantizationSpecUUIDGiver() = default;
QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT
QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT
uint64_t next_uuid_ = 0;
};
#include <atomic>
class QuantizationSpecUUIDGiver {
public:
static QuantizationSpecUUIDGiver& getInstance() {
static QuantizationSpecUUIDGiver instance;
return instance;
}
uint64_t getUUID() { return next_uuid_++; }
private:
QuantizationSpecUUIDGiver() = default;
QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT
QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT
std::atomic<uint64_t> next_uuid_{0};
};
🧰 Tools
🪛 Clang (14.0.6)

[error] 41-41: class 'QuantizationSpecUUIDGiver' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/compile/ir/linalg/Attribute.hpp around lines 41-56, the getUUID() method
increments next_uuid_ without synchronization leading to race conditions; change
next_uuid_ to std::atomic<uint64_t> and implement getUUID() to return
next_uuid_.fetch_add(1, std::memory_order_relaxed) (or stronger ordering if
required), and add the required #include <atomic> at the top of the file so
concurrent calls are atomic and thread-safe.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (1)

45-71: Consider moving the large JSON example to documentation.

While the inline JSON example is helpful for understanding the expected config format, it's quite verbose (27 lines). Consider moving it to the header file documentation, a separate markdown file, or condensing it to show only the relevant parts for this pass.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac94c95 and 64e8a41.

📒 Files selected for processing (4)
  • examples/qwen3_qnn_aot/compile.cpp
  • examples/qwen3_qnn_aot/qnn_aot_cfg.json
  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/core/Tensor.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/qwen3_qnn_aot/qnn_aot_cfg.json
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/core/Tensor.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/core/Tensor.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/core/Tensor.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/passes/MarkTensorIO.cpp
  • mllm/core/Tensor.cpp
🧬 Code graph analysis (3)
examples/qwen3_qnn_aot/compile.cpp (1)
mllm/core/Tensor.cpp (2)
  • zeros (97-101)
  • zeros (97-97)
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (3)
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)
  • run (65-86)
  • run (65-65)
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
  • op (17-17)
mllm/core/Tensor.cpp (2)
mllm/core/DataTypes.cpp (2)
  • nameOfType (107-154)
  • nameOfType (107-107)
mllm/core/DataTypes.hpp (1)
  • nameOfType (713-713)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-android
  • GitHub Check: build-x86
  • GitHub Check: build-macos
🔇 Additional comments (6)
mllm/core/Tensor.cpp (1)

370-370: LGTM! Improved error message.

Including the actual type name in the error message using nameOfType(dtype()) significantly improves debugging experience when unsupported types are encountered.

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (5)

1-15: LGTM!

The includes are well-organized and appropriate for this pass implementation.


22-26: LGTM!

The validation ensures the pass operates under supported conditions, and the error message clearly explains the constraint.


72-79: LGTM!

The llm_recipe validation is thorough with clear, actionable error messages.


81-92: LGTM!

The input tagging logic correctly validates tensor inputs and applies the appropriate attribute.


117-120: LGTM!

The return statement and factory function follow standard patterns for pass implementation.

Comment on lines +59 to +60
auto sequence = mllm::Tensor::zeros({1, N}, mllm::kInt64);
auto causal_mask = mllm::Tensor::zeros({1, 1, N, CL}, mllm::kUInt16);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the exact content of lines 59-60
cat -n examples/qwen3_qnn_aot/compile.cpp | sed -n '55,70p'

Repository: UbiquitousLearning/mllm

Length of output: 837


🏁 Script executed:

# Search for causal_mask usage patterns across the codebase
rg -n 'causal_mask.*Tensor' --type=cpp -C2

Repository: UbiquitousLearning/mllm

Length of output: 16711


🏁 Script executed:

# Search for mask dtype specifications in QNN backend and AOT code
rg -n 'Tensor::(zeros|empty|ones).*k(UInt|Int|Float)' --type=cpp -g 'backends/qnn/**' -C1

Repository: UbiquitousLearning/mllm

Length of output: 215


🏁 Script executed:

# Look for attention mask or causal mask related code in the entire repo
rg -n 'causal.*mask' --type=cpp -g '**/*.cpp' | head -20

Repository: UbiquitousLearning/mllm

Length of output: 2644


🏁 Script executed:

# Check if kUInt16 is used elsewhere for similar purposes
rg -n 'kUInt16' --type=cpp

Repository: UbiquitousLearning/mllm

Length of output: 2868


Change causal_mask dtype from kUInt16 to kFloat32 (line 60).

The causal_mask dtype is inconsistent with all other models in the codebase. Every causal_mask initialization—across LLaMA, Qwen, and custom ops—uses kFloat32. The kUInt16 choice appears to be an error and should be corrected to align with established patterns.

🤖 Prompt for AI Agents
In examples/qwen3_qnn_aot/compile.cpp around lines 59-60, the causal_mask is
created with dtype mllm::kUInt16 which is inconsistent; change the tensor
initialization to use mllm::kFloat32 instead so the causal_mask matches other
models (LLaMA, Qwen, custom ops) and maintains dtype consistency across the
codebase.

return ir::IRWriter::WalkResult::WALK_CONTINUE;
});

// Visit all graphs and assign names to unnamed operations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect comment copied from OpNamingPass.

The comment "Visit all graphs and assign names to unnamed operations" is incorrect for this pass, which marks tensor inputs and outputs. This appears to be copy-pasted from OpNamingPass.cpp without updating.

🔎 Proposed fix
-  // Visit all graphs and assign names to unnamed operations
+  // Mark tensor inputs and outputs for the main CallGraphOp
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/MarkTensorIO.cpp around line 44, the comment
"Visit all graphs and assign names to unnamed operations" is incorrect and was
copy-pasted from OpNamingPass; update the comment to accurately describe this
pass's purpose (marking tensor inputs and outputs), e.g., replace the line with
a concise description such as "Visit graphs and mark tensor inputs and outputs"
or similar wording that reflects the pass behavior.

Comment on lines +94 to +114
// Tag the outputs to call_graph to "qnn_graph_outputs"
{
auto outputs = call_main_graph_op->outputs();
for (auto o : outputs) {
if (o->isa_<ir::tensor::TensorValue>()) {
// That means after this graph, we have a lm_head!
// o 's outputs is cf.Return and lm_head(linear op)
if (!o->outputs().empty() && o->outputs().size() == 2 && o->outputs().front()->isa_<ir::cf::ReturnOp>()
&& (*std::next(o->outputs().begin()))->isa_<ir::linalg::LinearOp>()) {
auto lm_head_ir = (*std::next(o->outputs().begin()))->cast_<ir::linalg::LinearOp>();
if (lm_head_ir->outputs().front()->outputs().empty()) {
lm_head_ir->outputs().front()->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true));
}
} else {
o->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true));
}
} else {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Expect pure tensor value outputs to graph.CallGraphOp");
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Clarify the lm_head special case logic.

The output tagging logic contains a complex conditional (lines 101-106) that handles a special lm_head scenario. The logic is difficult to follow and lacks explanatory comments:

  1. The condition at lines 101-102 uses both front() and std::next() to access elements, which is not immediately clear.
  2. The check at line 104 (lm_head_ir->outputs().front()->outputs().empty()) lacks explanation of its purpose.
  3. The overall lm_head special case would benefit from a comment explaining why this handling is needed.
🔎 Proposed improvements

Add explanatory comments and consider using intermediate variables for clarity:

     // Tag the outputs to call_graph to "qnn_graph_outputs"
     {
       auto outputs = call_main_graph_op->outputs();
       for (auto o : outputs) {
         if (o->isa_<ir::tensor::TensorValue>()) {
-          // That means after this graph, we have a lm_head!
-          // o 's outputs is cf.Return and lm_head(linear op)
+          // Special case: if this graph output feeds both a ReturnOp and a lm_head LinearOp,
+          // mark the lm_head's output as the graph output instead (for ops on QNN)
           if (!o->outputs().empty() && o->outputs().size() == 2 && o->outputs().front()->isa_<ir::cf::ReturnOp>()
               && (*std::next(o->outputs().begin()))->isa_<ir::linalg::LinearOp>()) {
             auto lm_head_ir = (*std::next(o->outputs().begin()))->cast_<ir::linalg::LinearOp>();
+            // Only tag lm_head output if it's the final output (no further consumers)
             if (lm_head_ir->outputs().front()->outputs().empty()) {
               lm_head_ir->outputs().front()->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true));
             }
           } else {
             o->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true));
           }

Or use intermediate variables for better readability:

           if (!o->outputs().empty() && o->outputs().size() == 2) {
+            auto output_consumers = o->outputs();
+            auto first_consumer = output_consumers.front();
+            auto second_consumer = *std::next(output_consumers.begin());
+            
-            if (o->outputs().front()->isa_<ir::cf::ReturnOp>()
-                && (*std::next(o->outputs().begin()))->isa_<ir::linalg::LinearOp>()) {
+            if (first_consumer->isa_<ir::cf::ReturnOp>() && second_consumer->isa_<ir::linalg::LinearOp>()) {
-              auto lm_head_ir = (*std::next(o->outputs().begin()))->cast_<ir::linalg::LinearOp>();
+              auto lm_head_ir = second_consumer->cast_<ir::linalg::LinearOp>();

Comment thread mllm/core/Tensor.cpp
case kInt16: *(rhs_tensor.ptr<int16_t>()) = v; break;
case kInt8: *(rhs_tensor.ptr<int8_t>()) = v; break;
default: NYI("Type is not supported"); break;
case kUInt16: *(rhs_tensor.ptr<uint16_t>()) = v; break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "Tensor.cpp" -type f 2>/dev/null | head -5

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# First, let's understand the file structure
git ls-files | grep -i tensor | head -20

Repository: UbiquitousLearning/mllm

Length of output: 882


🏁 Script executed:

# Read the Tensor.cpp file to examine the relevant lines
wc -l mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 91


🏁 Script executed:

# Look at the equal(float v) method around line 369
sed -n '357,377p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1058


🏁 Script executed:

# Look at the scalar operators (lines 148-232)
sed -n '148,180p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1661


🏁 Script executed:

# Search for kUInt16 support across all scalar operator methods
rg -n 'case kUInt16' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 132


🏁 Script executed:

# Check the complete pattern of type handling in scalar operators
sed -n '190,232p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1982


🏁 Script executed:

rg -n 'kUInt16' mllm/ --type cpp

Repository: UbiquitousLearning/mllm

Length of output: 2758


🏁 Script executed:

# Check if there are any comments or documentation about kUInt16 support
rg -n 'UInt16' mllm/ --type cpp -B2 -A2 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 4216


🏁 Script executed:

# Look at the type enum definition to understand kUInt16
fd 'DataTypes' mllm/ --type f

Repository: UbiquitousLearning/mllm

Length of output: 114


🏁 Script executed:

# Check the complete operator+ implementation to compare
sed -n '148,185p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1819


*Add kUInt16 support to scalar operators (+, -, , /) to match equal() method support.

The equal(float v) method now supports kUInt16, but the scalar arithmetic operators (operator+, operator-, operator*, operator/) do not. Since kUInt16 is a supported type used throughout the codebase, users will encounter NYI("Type is not supported") runtime failures if they attempt scalar arithmetic operations on uint16 tensors. Either add kUInt16 handling to all scalar operators to match equal(), or remove it from equal() if uint16 scalar operations are intentionally unsupported.

Additionally, improve error messages in scalar operators to include the unsupported type name like equal() does: use NYI("Type is not supported {}", nameOfType(dtype())) instead of NYI("Type is not supported").

🤖 Prompt for AI Agents
In mllm/core/Tensor.cpp around line 369, the scalar arithmetic operators (+, -,
*, /) lack kUInt16 handling and their NYI errors lack the dtype name; update
each scalar operator's switch to add a kUInt16 case mirroring equal(float v)
(cast/use uint16_t appropriately when assigning results) and replace NYI("Type
is not supported") with NYI("Type is not supported {}", nameOfType(dtype())) so
unsupported-type errors include the type name.

Copy link
Copy Markdown
Owner

@UbiquitousLearning UbiquitousLearning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenghuaWang chenghuaWang merged commit d4cbe15 into UbiquitousLearning:main Dec 23, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants