feat(Qnn AOT): AOT and AOT Runtime. Qwen3 AOT Mode. by chenghuaWang · Pull Request #567 · UbiquitousLearning/mllm

chenghuaWang · 2025-12-23T06:06:29Z

Introduced new data types for per-tensor and per-channel quantization, including int8 and int16 types.
Updated the LinearOp class to handle new linear implementation types: kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64.
Enhanced the DataTypes utility functions to accommodate the new quantization types.
Modified Tensor operations to support addition and multiplication with int16 per-tensor symmetric types.
Updated the Module class to handle vector arguments for tensors and AnyValue types.

Summary by CodeRabbit

New Features
- QNN AOT pipeline and tooling added for Qwen3 AOT compilation, config files, and example CLI
- New Equal and Where tensor ops with public APIs and nn.functional.where
Improvements
- Expanded quantization types (int8/int16 per-tensor/per-channel) and related dtype support
- CPU backend: added comparison/where implementations and op registration
- IR/pass utilities to mark QNN graphs and name ops for deterministic compilation

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…entation

- Implemented CPUEqualOp and CPUWhereOp classes for element-wise comparison and conditional selection. - Added corresponding factory classes for both operations. - Integrated Equal and Where operations into the CPUBackend. - Updated Tensor class to include methods for equal comparisons with both Tensor and float inputs. - Enhanced the aops layer with EqualOp and WhereOp definitions, including forward and reshape methods. - Registered new operations in the IR and updated RTTI generation scripts. - Modified Functional API to expose where operation for user convenience. - Adjusted build configuration to use RelWithDebInfo for better debugging support.

… enhance CMake configuration

…rations

…text handling

…ation and tensor renaming

…antization configuration

…ementations - Introduced new data types for per-tensor and per-channel quantization, including int8 and int16 types. - Updated the LinearOp class to handle new linear implementation types: kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64. - Enhanced the DataTypes utility functions to accommodate the new quantization types. - Modified Tensor operations to support addition and multiplication with int16 per-tensor symmetric types. - Updated the Module class to handle vector arguments for tensors and AnyValue types.

coderabbitai · 2025-12-23T06:06:40Z

Walkthrough

Adds Qwen3 QNN AOT example and model implementation, introduces QNN AOT backend infrastructure (parsing, env/wrappers, AOT passes, pipeline, visitors), registers CPU Equal/Where ops with CPU factories and Tensor::equal API, extends quantization data types and LinearImpl variants, and updates build/config files.

Changes

Cohort / File(s)	Summary
Qwen3 QNN AOT Example `examples/CMakeLists.txt`, `examples/qwen3_qnn_aot/CMakeLists.txt`, `examples/qwen3_qnn_aot/compile.cpp`, `examples/qwen3_qnn_aot/config_1.7B.json`, `examples/qwen3_qnn_aot/qnn_aot_cfg.json`, `examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp`	Adds a new qwen3 QNN AOT example target and executable, model config and AOT config JSONs, and a Qwen3 model implementation with RoPE, attention, MLP, decoder, KV-cache and trace flow; compile.cpp orchestrates model load, trace, QNN AOT env, and lowering pipeline. Also comments out two example subdirs in examples/CMakeLists.txt.
QNN AOT Backend: core infra & API `mllm/backends/qnn/aot/QnnWrappersAPI.hpp`, `mllm/backends/qnn/aot/QnnWrappersAPI.cpp`, `mllm/backends/qnn/aot/QnnTargetMachineParser.hpp`, `mllm/backends/qnn/aot/QnnTargetMachineParser.cpp`, `mllm/backends/qnn/QNNUtils.hpp`, `mllm/backends/qnn/Register.cpp`	Adds QnnAOT param/tensor/node/operation/graph modeling types, QnnAOTDataTypeSize helper, const-correct QnnAOTEnv constructors, dynamic alloc/free for AOT tensors, and a Qcom target-machine JSON parser; minor include/format tweaks.
QNN AOT Backend: passes & visitors `mllm/backends/qnn/aot/passes/AOTCompileContext.hpp`, `mllm/backends/qnn/aot/passes/AOTCompileContext.cpp`, `mllm/backends/qnn/aot/passes/AOTPipeline.hpp`, `mllm/backends/qnn/aot/passes/AOTPipeline.cpp`, `mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp`, `mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp`, `mllm/backends/qnn/aot/passes/OpNamingPass.hpp`, `mllm/backends/qnn/aot/passes/OpNamingPass.cpp`, `mllm/backends/qnn/aot/visitor/Base.hpp`, `mllm/backends/qnn/aot/visitor/Elewise.hpp`, `mllm/backends/qnn/aot/visitor/Elewise.cpp`	Introduces AOTCompileContext singleton for config/env, pipeline factory for QNN AOT lowering, passes to mark graphs using QNN and to assign deterministic op names, base pattern class and an Add pattern visitor (skeleton).
CPU backend: ops & registration `mllm/backends/cpu/CPUBackend.cpp`, `mllm/backends/cpu/ops/CmpOp.hpp`, `mllm/backends/cpu/ops/CmpOp.cpp`, `mllm/backends/cpu/ops/WhereOp.hpp`, `mllm/backends/cpu/ops/WhereOp.cpp`	Adds CPUEqualOp and CPUWhereOp classes and factories, registers them in CPUBackend, implements CPUEqualOp forward/ broadcasting for several dtypes, and adds a WhereOp stub (unimplemented forward).
Core IR & op kinds `mllm/compile/ir/GeneratedRTTIKind.hpp`, `mllm/compile/ir/NodeRTTIClassOfImpl.hpp`, `mllm/compile/ir/linalg/Op.hpp`, `mllm/compile/ir/linalg/Op.cpp`, `mllm/compile/ir/rtti_kind_gen.py`	Adds EqualOp and WhereOp kinds to IR/RTTI generation and registers them in linalg op declarations; updates dump formatting adjustments.
Core aops: Equal/Where & Tensor API `mllm/core/aops/CmpOp.hpp`, `mllm/core/aops/CmpOp.cpp`, `mllm/core/aops/WhereOp.hpp`, `mllm/core/aops/WhereOp.cpp`, `mllm/core/Tensor.hpp`, `mllm/core/Tensor.cpp`, `mllm/nn/Functional.hpp`, `mllm/nn/Functional.cpp`	Adds EqualOp and WhereOp high-level aops with trace/reshape stubs, broadcastShapes helper, implements Tensor::equal(Tensor)/equal(float), and nn::functional::where wrapper.
Data types & Linear impls `mllm/core/DataTypes.hpp`, `mllm/core/DataTypes.cpp`, `mllm/core/aops/LinearOp.hpp`, `mllm/backends/cpu/ops/LinearOp.cpp`	Introduces many per-tensor/per-channel quantized typedefs and DataTypes enum entries (int8/uint8/int16/uint16 symmetric/asymmetric), registers them with type-info macros, and adds new LinearImplTypes kQNN_LPBQ_w4a16o16_G32/G64 with output dtype adjustments.
IR/node utilities & minor changes `mllm/compile/ir/Node.hpp`, `mllm/compile/ir/Node.cpp`, `mllm/compile/ir/tensor/Op.cpp`, `mllm/compile/ir/graph/Op.cpp`, `mllm/compile/ir/NodeRTTIClassOfImpl.hpp`	Adds Node::attrNum(), propagates device in RegisterOp::build, adjusts Op dump formatting and adds RTTI macros for Equal/Where.
Build/config changes `examples/CMakeLists.txt`, `tasks/build_x86_qnn_aot.yaml`	Disables two minicpm subdirs in examples CMake, changes build type in CI task from Release to RelWithDebInfo.
Misc: guards/fixes `mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp`	Adds include guards around FP16/FP32 conversion macros to avoid redefinition.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as compile.cpp
    participant QwenModel as Qwen3ForCausalLM
    participant Trace as IR Trace
    participant QnnEnv as QnnAOTEnv
    participant PassMgr as PassManager
    participant Passes as MarkQnnGraphPass / OpNamingPass
    participant Output as MIR Writer

    User->>CLI: run with model/config/AOT-config
    CLI->>CLI: parse args, validate AOT config
    CLI->>QwenModel: load model, params, gen RoPE, setup KV-cache
    CLI->>Trace: call model.trace()
    Trace-->>CLI: IR graph
    CLI->>QnnEnv: init from target machine JSON
    CLI->>PassMgr: create with createQnnAOTLoweringPipeline(env, cfg)
    PassMgr->>Passes: run MarkQnnGraphPass (mark using_qnn)
    PassMgr->>Passes: run OpNamingPass (assign deterministic names)
    PassMgr-->>CLI: transformed IR
    CLI->>Output: write final MIR

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

PR #562: QNN AOT backend infrastructure — overlaps QnnWrappersAPI, AOT passes/pipeline and related AOT types.
PR #563: Qcom target-machine and QNN AOT enums — closely related to QnnTargetMachine parsing/usage changes here.
PR #553: CPU backend op registrations and factories — similar changes to CPUBackend registration patterns and op factory additions.

Suggested reviewers

liang1232018
yirongjie
oreomaker

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.21% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly describes the main feature being added: QNN AOT (Ahead-of-Time) compilation support and Qwen3 AOT mode implementation.
Description check	✅ Passed	The PR description provides a bulleted list of key changes including data types, LinearOp updates, DataTypes utilities, Tensor operations, and Module class enhancements, addressing the main objectives.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

UbiquitousLearning

LGTM

coderabbitai

Actionable comments posted: 17

🧹 Nitpick comments (24)

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp (1)
129-135: LGTM! Include guards improve safety.

The addition of #ifndef guards around the macro definitions prevents potential redefinition errors and is a good defensive programming practice. The #endif at line 135 correctly closes the #else block from line 106.
Optional: Consider consistency across all branches

The ARM_NEON (lines 76-77) and MSC_VER (lines 103-104) branches define the same macros without include guards. If macro redefinition is a concern for the #else branch, consider whether the other branches should also be guarded for consistency:
// ARM_NEON branch
#ifndef MLLM_FP16_TO_FP32
#define MLLM_FP16_TO_FP32(x) ((float)(x))
#endif
#ifndef MLLM_FP32_TO_FP16
#define MLLM_FP32_TO_FP16(x) ((mllm_fp16_t)x)
#endif

// Similar for MSC_VER branch
Alternatively, if guards are only needed in the #else branch due to specific platform requirements, this is fine as-is.
mllm/nn/Module.hpp (1)
155-160: Consider preserving lvalue vector arguments.

The current implementation unconditionally moves from vector elements, which will leave lvalue arguments in a moved-from state. When a user passes a vector by lvalue reference (e.g., std::vector<Tensor> v; module(v, ...)), they might not expect v's elements to be consumed.

Consider using conditional forwarding based on value category to copy from lvalues and move from rvalues, or document that this function consumes vector arguments.
🔎 Proposed refinement using conditional move/copy

One approach is to check if the argument is an rvalue reference and conditionally move:
 } else if constexpr (std::is_same_v<CleanType, std::vector<Tensor>>) {
   tensors.reserve(tensors.size() + args.size());
-  for (auto& tensor : args) { tensors.push_back(std::move(tensor)); }
+  if constexpr (std::is_rvalue_reference_v<decltype(args)>) {
+    for (auto& tensor : args) { tensors.push_back(std::move(tensor)); }
+  } else {
+    for (const auto& tensor : args) { tensors.push_back(tensor); }
+  }
 } else if constexpr (std::is_same_v<CleanType, std::vector<AnyValue>>) {
   others.reserve(others.size() + args.size());
-  for (auto& value : args) { others.push_back(std::move(value)); }
+  if constexpr (std::is_rvalue_reference_v<decltype(args)>) {
+    for (auto& value : args) { others.push_back(std::move(value)); }
+  } else {
+    for (const auto& value : args) { others.push_back(value); }
+  }
 }
Alternatively, document that the function consumes vector arguments and update the function documentation to clarify move semantics.
mllm/backends/qnn/aot/QnnTargetMachineParser.cpp (2)
14-67: Consider using maps for improved scalability.

The current if-chain approach works but could be refactored to use std::unordered_map<std::string, EnumType> for O(1) lookups instead of O(n). This would improve maintainability and scalability, especially for parseChipset with 18 variants.
🔎 Example refactor for parseHTPArch
 QcomHTPArch parseHTPArch(const std::string& arch_str) {
-  if (arch_str == "NONE") return QcomHTPArch::NONE;
-  if (arch_str == "V68") return QcomHTPArch::V68;
-  if (arch_str == "V69") return QcomHTPArch::V69;
-  if (arch_str == "V73") return QcomHTPArch::V73;
-  if (arch_str == "V75") return QcomHTPArch::V75;
-  if (arch_str == "V79") return QcomHTPArch::V79;
-  if (arch_str == "V81") return QcomHTPArch::V81;
-  throw std::invalid_argument("Unknown HTP architecture: " + arch_str);
+  static const std::unordered_map<std::string, QcomHTPArch> arch_map = {
+    {"NONE", QcomHTPArch::NONE}, {"V68", QcomHTPArch::V68}, {"V69", QcomHTPArch::V69},
+    {"V73", QcomHTPArch::V73}, {"V75", QcomHTPArch::V75}, {"V79", QcomHTPArch::V79},
+    {"V81", QcomHTPArch::V81}
+  };
+  auto it = arch_map.find(arch_str);
+  if (it != arch_map.end()) return it->second;
+  throw std::invalid_argument("Unknown HTP architecture: " + arch_str);
 }
69-86: Improve error messages with field existence checks.

While the try-catch handles missing fields, it doesn't provide specific information about which field is missing. Adding explicit checks would improve debuggability.
🔎 Proposed improvement
 QcomTargetMachine parseQcomTargetMachineFromJSON(const std::string& json_str) {
   try {
     auto json = nlohmann::json::parse(json_str);
 
     if (!json.contains("target_machine")) { throw std::invalid_argument("JSON must contain 'target_machine' field"); }
 
     auto tm = json["target_machine"];
+
+    // Validate all required fields are present
+    const std::vector<std::string> required_fields = {
+      "htp_arch", "htp_chipset", "htp_try_best_performance", 
+      "htp_security_pd_session", "htp_vtcm_capability_in_mb"
+    };
+    for (const auto& field : required_fields) {
+      if (!tm.contains(field)) {
+        throw std::invalid_argument("target_machine missing required field: " + field);
+      }
+    }
 
     QcomTargetMachine result;
     result.soc_htp_arch = parseHTPArch(tm["htp_arch"].get<std::string>());
     result.soc_htp_chipset = parseChipset(tm["htp_chipset"].get<std::string>());
     result.soc_htp_performance = parsePerformance(tm["htp_try_best_performance"].get<std::string>());
     result.soc_htp_security_pd_session = parseSecurityPDSession(tm["htp_security_pd_session"].get<std::string>());
     result.soc_htp_vtcm_total_memory_size = tm["htp_vtcm_capability_in_mb"].get<uint32_t>();
 
     return result;
   } catch (const nlohmann::json::exception& e) { throw std::invalid_argument(std::string("JSON parsing error: ") + e.what()); }
 }
mllm/core/Tensor.hpp (1)
420-426: Rename parameter v to a more descriptive name.

Static analysis flags v as too short (minimum 3 characters required). Consider renaming to other or value for consistency with the project's naming conventions.
🔎 Proposed fix
-  Tensor equal(Tensor v);
+  Tensor equal(Tensor other);

   /**
    * @brief equal
    * @return uint8 mask
    */
-  Tensor equal(float v);
+  Tensor equal(float value);
mllm/backends/qnn/aot/passes/AOTPipeline.hpp (1)
6-9: Add missing <string> include.

The function signature uses std::string but the header is not explicitly included. While it may work via transitive includes, explicit includes improve maintainability.
🔎 Proposed fix
 #include <vector>
+#include <string>

 #include "mllm/backends/qnn/aot/QnnWrappersAPI.hpp"
 #include "mllm/compile/passes/Pass.hpp"
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)
19-21: Consider completing the Rule of 5.

Static analysis flags that copy constructor and assignment are defined without destructor or move operations. For a singleton, explicitly deleting move operations and defaulting the destructor clarifies intent.
🔎 Proposed fix
   // Delete copy constructor and assignment operator
   AOTCompileContext(const AOTCompileContext&) = delete;
   AOTCompileContext& operator=(const AOTCompileContext&) = delete;
+  AOTCompileContext(AOTCompileContext&&) = delete;
+  AOTCompileContext& operator=(AOTCompileContext&&) = delete;
+  ~AOTCompileContext() = default;
28-28: Rename parameter fp to a more descriptive name.

Static analysis flags fp as too short (minimum 3 characters). Consider renaming to file_path or config_path for clarity.
🔎 Proposed fix
-  void setConfig(const std::string& fp);
+  void setConfig(const std::string& config_path);
mllm/backends/cpu/ops/WhereOp.cpp (1)
4-4: Unused include.

The <cstring> header is included but not used in this file.
🔎 Proposed fix
-#include <cstring>
mllm/backends/qnn/aot/visitor/Elewise.cpp (1)
29-31: Incomplete TODO needs description.

The TODO comment lacks a description of what needs to be implemented. Per coding guidelines, TODO comments should follow the format TODO: <description>. Currently addNode validates casts but performs no actual node addition logic.

Would you like me to help outline the expected implementation for adding the QNN node, or should I open an issue to track this task?
🔎 Proposed fix for TODO format
-  // TODO
+  // TODO: Implement QNN graph node addition for AddOp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (2)
7-11: Consider null-checking the env parameter.

The env pointer is passed directly to setEnv() without validation. If a null pointer is passed, it could cause issues downstream when the environment is accessed.
🔎 Proposed fix
 std::vector<std::shared_ptr<ir::Pass>> createQnnAOTLoweringPipeline(QnnAOTEnv* env, const std::string& config_path) {
   std::vector<ir::Pass::ptr_t> ret;
 
+  if (!env) {
+    throw std::invalid_argument("QnnAOTEnv pointer cannot be null");
+  }
+
   AOTCompileContext::getInstance().setEnv(env);
   AOTCompileContext::getInstance().setConfig(config_path);
1-5: Missing license header.

For consistency with other new files in this PR, consider adding the standard license header.
🔎 Proposed fix
+// Copyright (c) MLLM Team.
+// Licensed under the MIT License.
+
 #include "mllm/backends/qnn/aot/passes/AOTPipeline.hpp"
 #include "mllm/backends/qnn/aot/passes/AOTCompileContext.hpp"
examples/qwen3_qnn_aot/CMakeLists.txt (1)
5-8: Consider validating QAIRT_SDK_ROOT environment variable.

If QAIRT_SDK_ROOT is not set, the include paths will be empty, potentially causing cryptic compilation errors. Consider adding validation:
🔎 Proposed fix
+if(NOT DEFINED ENV{QAIRT_SDK_ROOT})
+  message(FATAL_ERROR "QAIRT_SDK_ROOT environment variable is not set. Please set it to your QNN SDK installation path.")
+endif()
+
 target_include_directories(mllm-qwen3-aot-c PRIVATE
   $ENV{QAIRT_SDK_ROOT}/include # QNN SDK include
   $ENV{QAIRT_SDK_ROOT}/include/QNN # QNN SDK include
 )
mllm/backends/cpu/ops/WhereOp.hpp (1)
18-23: Inconsistent final specifier on factory class.

CPUWhereOpFactory is missing the final specifier, while CPUEqualOpFactory in CmpOp.hpp is marked final. For consistency, both factory classes should follow the same pattern.
🔎 Proposed fix
-class CPUWhereOpFactory : public TypedOpFactory<OpTypes::kWhere, aops::WhereOpOptions> {
+class CPUWhereOpFactory final : public TypedOpFactory<OpTypes::kWhere, aops::WhereOpOptions> {
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (1)
74-80: Unused parameter in lambda.

The writer parameter in the lambda is captured but never used. Consider using [[maybe_unused]] or simply omitting the parameter name.
🔎 Proposed fix
   writer.walk<ir::graph::CallGraphOp>(
-      [&](ir::IRWriter& /*writer*/, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {
+      [&](ir::IRWriter& /*unused*/, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {
Or if the signature allows, use auto:
   writer.walk<ir::graph::CallGraphOp>(
-      [&](ir::IRWriter& /*writer*/, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {
+      [&](auto&&, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {
examples/qwen3_qnn_aot/compile.cpp (1)

23-24: Consider making N and CL configurable.

These constants control sequence length and cache length. For flexibility across different model configurations, consider exposing them via CLI arguments or deriving from model config.
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (1)
35-35: Unnecessary trailing semicolon after function definition.

Minor style issue - the semicolon after the closing brace of recursiveVisitGraph is not needed.
🔎 Proposed fix
-      });
-};
+      });
+}
mllm/core/aops/WhereOp.hpp (1)
27-27: Inconsistent options() return type compared to EqualOp.

WhereOp::options() returns a mutable reference WhereOpOptions&, while the related EqualOp::options() in CmpOp.hpp returns const EqualOpOptions&. For consistency and to prevent unintended modifications, consider using a const return type unless mutability is explicitly required.
🔎 Proposed fix for consistency
-  inline WhereOpOptions& options() { return options_; }
+  inline const WhereOpOptions& options() const { return options_; }
mllm/core/DataTypes.hpp (1)
294-296: Non-ASCII characters in code comment.

The comment contains Chinese text ("拼接" meaning "concatenate"). While valid UTF-8, using English in code comments improves accessibility for all contributors.
🔎 Proposed fix
 // These are placeholder types for per-tensor and per-channel quantization.
-// They cannot actually run; at runtime, they require three tensors (weight, scale, zp) to be拼接.
+// They cannot actually run; at runtime, they require three tensors (weight, scale, zp) to be concatenated.
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (3)
191-193: Magic number for attention masking.

The value -20 used for masked attention should be a named constant with documentation explaining why this specific value was chosen. This improves readability and maintainability.
🔎 Proposed fix
+  // Large negative value to effectively zero out masked attention scores after softmax
+  static constexpr float kMaskedAttnPenalty = -20.0f;
+
   // Masked Softmax
   auto attn_min = attn.min(-1, true);
-  float minus_value = -20;
-  attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min + minus_value);
+  attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min + kMaskedAttnPenalty);
202-203: Public member variable layer_idx_ breaks encapsulation.

layer_idx_ is declared public but is only set externally (line 253). Consider making it private with a setter, or initializing it via the constructor.
🔎 Proposed fix
+ public:
+  void setLayerIdx(int idx) { layer_idx_ = idx; }
+
  private:
-  int layer_idx_;
+  int layer_idx_ = 0;
 };
And update the usage in line 253:
for (auto [idx, b] : enumerate(decode_blocks_.list())) { b.self_attn_.setLayerIdx(idx); }
325-343: Duplicate error handling logic for KV caches.

The same pattern for checking and throwing on missing KV cache is repeated for both keys (lines 321-331) and values (lines 333-344). Consider extracting a helper to reduce duplication.
🔎 Proposed fix
+  auto getKVCache = [&](const std::string& name) -> Tensor {
+    if (input.count(name)) {
+      return input.at(name);
+    }
+    throw std::runtime_error("Missing KV cache: " + name);
+  };
+
   // Append Key
   for (int i = 0; i < cfg.num_hidden_layers; ++i) {
-    auto past_key_name = "past_key_" + std::to_string(i);
-    if (input.count(past_key_name)) {
-      kv_caches.push_back(input.at(past_key_name));
-    } else {
-      throw std::runtime_error("Missing KV cache for layer " + std::to_string(i));
-    }
+    kv_caches.push_back(getKVCache("past_key_" + std::to_string(i)));
   }

   // Append Value
   for (int i = 0; i < cfg.num_hidden_layers; ++i) {
-    auto past_value_name = "past_value_" + std::to_string(i);
-    if (input.count(past_value_name)) {
-      kv_caches.push_back(input.at(past_value_name));
-    } else {
-      throw std::runtime_error("Missing KV cache for layer " + std::to_string(i));
-    }
+    kv_caches.push_back(getKVCache("past_value_" + std::to_string(i)));
   }
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
55-56: Consider using sizeof instead of magic number for int64 size.

For clarity and maintainability, use sizeof(int64_t) instead of the literal 8.
🔎 Proposed fix
     case QNN_DATATYPE_INT_64:
-    case QNN_DATATYPE_UINT_64: return 8;
+    case QNN_DATATYPE_UINT_64: return sizeof(int64_t);
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
28-28: Reserved identifier uses double underscore prefix.

The function __mllmLoggerCallback4QnnLogger uses a double underscore prefix, which is reserved for compiler/standard library implementations. Consider renaming to avoid potential conflicts.
🔎 Proposed fix
-void __mllmLoggerCallback4QnnLogger(const char* fmt, QnnLog_Level_t level, uint64_t times_tamp, va_list argp);
+void mllmLoggerCallback4QnnLogger(const char* fmt, QnnLog_Level_t level, uint64_t times_tamp, va_list argp);
Note: Also update the implementation in QnnWrappersAPI.cpp and any call sites.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f80e4e7 and 3dbe563.

📒 Files selected for processing (74)

examples/CMakeLists.txt
examples/qwen3_qnn_aot/CMakeLists.txt
examples/qwen3_qnn_aot/compile.cpp
examples/qwen3_qnn_aot/config_1.7B.json
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
examples/qwen3_qnn_aot/qnn_aot_cfg.json
examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
mllm/backends/cpu/CPUBackend.cpp
mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/backends/cpu/ops/CmpOp.cpp
mllm/backends/cpu/ops/CmpOp.hpp
mllm/backends/cpu/ops/LinearOp.cpp
mllm/backends/cpu/ops/WhereOp.cpp
mllm/backends/cpu/ops/WhereOp.hpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/Register.cpp
mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
mllm/backends/qnn/aot/QnnTargetMachineParser.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.hpp
mllm/backends/qnn/aot/passes/AutoInsertQuanAndDeQuantPass.cpp
mllm/backends/qnn/aot/passes/AutoInsertQuanAndDeQuantPass.hpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp
mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/backends/qnn/aot/passes/OpNamingPass.hpp
mllm/backends/qnn/aot/passes/TensorRenamingPass.cpp
mllm/backends/qnn/aot/passes/TensorRenamingPass.hpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/aot/visitor/Concat.cpp
mllm/backends/qnn/aot/visitor/Concat.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/visitor/Equal.cpp
mllm/backends/qnn/aot/visitor/Equal.hpp
mllm/backends/qnn/aot/visitor/Index.cpp
mllm/backends/qnn/aot/visitor/Index.hpp
mllm/backends/qnn/aot/visitor/Reduce.cpp
mllm/backends/qnn/aot/visitor/Reduce.hpp
mllm/backends/qnn/aot/visitor/Repeat.cpp
mllm/backends/qnn/aot/visitor/Repeat.hpp
mllm/backends/qnn/aot/visitor/Slice.cpp
mllm/backends/qnn/aot/visitor/Slice.hpp
mllm/backends/qnn/aot/visitor/Transpose.cpp
mllm/backends/qnn/aot/visitor/Transpose.hpp
mllm/backends/qnn/aot/visitor/Where.cpp
mllm/backends/qnn/aot/visitor/Where.hpp
mllm/compile/ir/GeneratedRTTIKind.hpp
mllm/compile/ir/Node.cpp
mllm/compile/ir/Node.hpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
mllm/compile/ir/graph/Op.cpp
mllm/compile/ir/linalg/Op.cpp
mllm/compile/ir/linalg/Op.hpp
mllm/compile/ir/rtti_kind_gen.py
mllm/compile/ir/tensor/Op.cpp
mllm/core/DataTypes.cpp
mllm/core/DataTypes.hpp
mllm/core/OpTypes.hpp
mllm/core/Tensor.cpp
mllm/core/Tensor.hpp
mllm/core/aops/CmpOp.cpp
mllm/core/aops/CmpOp.hpp
mllm/core/aops/LinearOp.hpp
mllm/core/aops/WhereOp.cpp
mllm/core/aops/WhereOp.hpp
mllm/nn/Functional.cpp
mllm/nn/Functional.hpp
mllm/nn/Module.hpp
tasks/build_x86_qnn_aot.yaml

💤 Files with no reviewable changes (1)

mllm/backends/qnn/Register.cpp

🧰 Additional context used

📓 Path-based instructions (5)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/core/Tensor.hpp
mllm/nn/Functional.hpp
mllm/compile/ir/GeneratedRTTIKind.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.hpp
mllm/backends/qnn/QNNUtils.hpp
mllm/core/OpTypes.hpp
mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
mllm/backends/cpu/ops/CmpOp.hpp
mllm/core/aops/CmpOp.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/passes/OpNamingPass.hpp
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp
mllm/nn/Module.hpp
mllm/compile/ir/rtti_kind_gen.py
mllm/core/aops/WhereOp.hpp
mllm/core/DataTypes.cpp
mllm/core/aops/LinearOp.hpp
mllm/backends/cpu/ops/WhereOp.hpp
mllm/backends/cpu/ops/CmpOp.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/QnnTargetMachineParser.hpp
mllm/backends/cpu/ops/WhereOp.cpp
mllm/nn/Functional.cpp
mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/core/aops/WhereOp.cpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/compile/ir/Node.hpp
mllm/compile/ir/Node.cpp
mllm/core/DataTypes.hpp
mllm/compile/ir/graph/Op.cpp
mllm/compile/ir/linalg/Op.hpp
mllm/backends/cpu/CPUBackend.cpp
mllm/core/Tensor.cpp
mllm/backends/cpu/ops/LinearOp.cpp
mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
mllm/compile/ir/tensor/Op.cpp
mllm/compile/ir/linalg/Op.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/core/aops/CmpOp.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/core/Tensor.hpp
mllm/nn/Functional.hpp
mllm/compile/ir/GeneratedRTTIKind.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.hpp
mllm/backends/qnn/QNNUtils.hpp
mllm/core/OpTypes.hpp
mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
mllm/backends/cpu/ops/CmpOp.hpp
mllm/core/aops/CmpOp.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/passes/OpNamingPass.hpp
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp
mllm/nn/Module.hpp
mllm/compile/ir/rtti_kind_gen.py
mllm/core/aops/WhereOp.hpp
mllm/core/DataTypes.cpp
mllm/core/aops/LinearOp.hpp
mllm/backends/cpu/ops/WhereOp.hpp
mllm/backends/cpu/ops/CmpOp.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/QnnTargetMachineParser.hpp
mllm/backends/cpu/ops/WhereOp.cpp
mllm/nn/Functional.cpp
mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/core/aops/WhereOp.cpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/compile/ir/Node.hpp
mllm/compile/ir/Node.cpp
mllm/core/DataTypes.hpp
mllm/compile/ir/graph/Op.cpp
mllm/compile/ir/linalg/Op.hpp
mllm/backends/cpu/CPUBackend.cpp
mllm/core/Tensor.cpp
mllm/backends/cpu/ops/LinearOp.cpp
mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
mllm/compile/ir/tensor/Op.cpp
mllm/compile/ir/linalg/Op.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/core/aops/CmpOp.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/core/Tensor.hpp
mllm/nn/Functional.hpp
mllm/compile/ir/GeneratedRTTIKind.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.hpp
mllm/backends/qnn/QNNUtils.hpp
mllm/core/OpTypes.hpp
mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
mllm/backends/cpu/ops/CmpOp.hpp
mllm/core/aops/CmpOp.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/passes/OpNamingPass.hpp
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp
mllm/nn/Module.hpp
mllm/compile/ir/rtti_kind_gen.py
mllm/core/aops/WhereOp.hpp
mllm/core/DataTypes.cpp
mllm/core/aops/LinearOp.hpp
mllm/backends/cpu/ops/WhereOp.hpp
mllm/backends/cpu/ops/CmpOp.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/QnnTargetMachineParser.hpp
mllm/backends/cpu/ops/WhereOp.cpp
mllm/nn/Functional.cpp
mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/core/aops/WhereOp.cpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/compile/ir/Node.hpp
mllm/compile/ir/Node.cpp
mllm/core/DataTypes.hpp
mllm/compile/ir/graph/Op.cpp
mllm/compile/ir/linalg/Op.hpp
mllm/backends/cpu/CPUBackend.cpp
mllm/core/Tensor.cpp
mllm/backends/cpu/ops/LinearOp.cpp
mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
mllm/compile/ir/tensor/Op.cpp
mllm/compile/ir/linalg/Op.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/core/aops/CmpOp.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/compile/ir/rtti_kind_gen.py
mllm/core/DataTypes.cpp
mllm/backends/cpu/ops/CmpOp.cpp
mllm/backends/cpu/ops/WhereOp.cpp
mllm/nn/Functional.cpp
mllm/backends/qnn/aot/passes/OpNamingPass.cpp
mllm/core/aops/WhereOp.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/compile/ir/Node.cpp
mllm/compile/ir/graph/Op.cpp
mllm/backends/cpu/CPUBackend.cpp
mllm/core/Tensor.cpp
mllm/backends/cpu/ops/LinearOp.cpp
mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
mllm/compile/ir/tensor/Op.cpp
mllm/compile/ir/linalg/Op.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/core/aops/CmpOp.cpp

{mllm,mllm-cli,pymllm}/**/*.{sh,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

If a file starts with a shebang, it must be the first line and contain only the interpreter path and optional single argument, encoded in UTF-8.

Files:

mllm/compile/ir/rtti_kind_gen.py

🧬 Code graph analysis (27)

mllm/backends/qnn/aot/passes/AOTPipeline.hpp (1)

mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)

env (26-26)

env (26-26)

mllm/backends/qnn/aot/passes/AOTCompileContext.cpp (1)

mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (1)

fp (28-28)

mllm/backends/cpu/ops/CmpOp.hpp (3)

mllm/backends/cpu/ops/WhereOp.hpp (3)

options (20-22)

options (20-20)

inputs (15-15)

mllm/core/aops/CmpOp.hpp (3)

inputs (21-21)

inputs (23-23)

inputs (25-25)

mllm/core/aops/WhereOp.hpp (3)

inputs (21-21)

inputs (23-23)

inputs (25-25)

mllm/core/aops/CmpOp.hpp (5)

mllm/core/BaseOp.hpp (5)

ploader (159-159)

trace_context (161-161)

inputs (163-163)

inputs (163-163)

inputs (165-165)

mllm/core/aops/WhereOp.hpp (6)

ploader (17-17)

trace_context (19-19)

inputs (21-21)

inputs (23-23)

inputs (25-25)

options_ (27-27)

mllm/core/aops/PadOp.hpp (3)

ploader (28-28)

trace_context (30-30)

options_ (38-38)

mllm/core/aops/MatMulOp.hpp (3)

ploader (54-54)

trace_context (56-56)

options_ (64-64)

mllm/core/aops/IndexOp.hpp (3)

ploader (20-20)

trace_context (22-22)

options_ (30-30)

mllm/backends/qnn/aot/visitor/Elewise.cpp (4)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/visitor/Base.hpp (3)

op (17-17)

op (17-17)

g_name (21-23)

mllm/backends/qnn/aot/visitor/Elewise.hpp (2)

op (15-15)

g_name (17-18)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

op (17-17)

examples/qwen3_qnn_aot/compile.cpp (6)

mllm/utils/Argparse.hpp (1)

Argparse (135-135)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (6)

Qwen3ForCausalLM (297-309)

Qwen3ForCausalLM (297-297)

makeRoPEInvFreq (17-22)

makeRoPEInvFreq (17-17)

makeRotaryPosEmbedding (24-71)

makeRotaryPosEmbedding (24-25)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)

QnnAOTEnv (232-232)

QnnAOTEnv (234-234)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

QnnAOTEnv (250-250)

QnnAOTEnv (252-254)

mllm/backends/qnn/aot/QnnTargetMachineParser.cpp (2)

parseQcomTargetMachineFromJSONFile (88-96)

parseQcomTargetMachineFromJSONFile (88-88)

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (2)

createQnnAOTLoweringPipeline (7-17)

createQnnAOTLoweringPipeline (7-7)

mllm/nn/Module.hpp (1)

mllm/utils/AnyValue.hpp (2)

value (90-95)

value (90-91)

mllm/core/aops/WhereOp.hpp (2)

mllm/backends/cpu/ops/WhereOp.hpp (3)

options (20-22)

options (20-20)

inputs (15-15)

mllm/core/aops/CmpOp.hpp (6)

ploader (17-17)

trace_context (19-19)

inputs (21-21)

inputs (23-23)

inputs (25-25)

options_ (27-27)

mllm/backends/cpu/ops/CmpOp.cpp (2)

mllm/core/aops/CmpOp.hpp (3)

inputs (21-21)

inputs (23-23)

inputs (25-25)

mllm/core/Tensor.cpp (2)

dtype (402-402)

dtype (402-402)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (2)

mllm/models/qwen_npu/modeling_qwen_npu_cpu.hpp (2)

self_attn_ (214-214)

embedding_ (277-277)

mllm/models/qwen_npu/modeling_qwen_npu.hpp (1)

block (442-444)

mllm/backends/qnn/aot/visitor/Elewise.hpp (2)

mllm/backends/qnn/aot/visitor/Base.hpp (3)

op (17-17)

op (17-17)

g_name (21-23)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/QnnTargetMachineParser.hpp (1)

mllm/engine/ConfigFile.hpp (1)

json_str (39-39)

mllm/backends/cpu/ops/WhereOp.cpp (3)

mllm/backends/cpu/ops/WhereOp.hpp (4)

CPUWhereOp (13-13)

options (20-22)

options (20-20)

inputs (15-15)

mllm/core/aops/WhereOp.cpp (2)

forward (23-25)

forward (23-23)

mllm/core/aops/WhereOp.hpp (3)

inputs (21-21)

inputs (23-23)

inputs (25-25)

mllm/nn/Functional.cpp (3)

mllm/core/Tensor.hpp (4)

v (26-29)

v (26-26)

v (32-35)

v (32-32)

mllm/compile/ir/linalg/Op.hpp (1)

ctx (155-155)

mllm/core/Tensor.cpp (1)

instance (324-324)

mllm/backends/qnn/aot/passes/OpNamingPass.cpp (1)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/visitor/Base.hpp (1)

mllm/backends/qnn/aot/visitor/Elewise.hpp (2)

op (15-15)

g_name (17-18)

mllm/core/aops/WhereOp.cpp (2)

mllm/core/aops/WhereOp.hpp (6)

WhereOp (15-15)

ploader (17-17)

trace_context (19-19)

inputs (21-21)

inputs (23-23)

inputs (25-25)

mllm/backends/cpu/ops/WhereOp.hpp (3)

options (20-22)

options (20-20)

inputs (15-15)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (2)

mllm/backends/qnn/aot/visitor/Elewise.hpp (1)

op (15-15)

mllm/compile/ir/Node.hpp (2)

op (468-468)

op (470-470)

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (5)

mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)

env (26-26)

env (26-26)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (2)

createMarkQnnGraphPass (67-67)

createMarkQnnGraphPass (67-67)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

createMarkQnnGraphPass (20-20)

mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)

createOpNamingPass (88-88)

createOpNamingPass (88-88)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

createOpNamingPass (20-20)

mllm/compile/ir/graph/Op.cpp (2)

mllm/compile/ir/Node.cpp (2)

dumpAttributes (17-30)

dumpAttributes (17-17)

mllm/compile/ir/Node.hpp (7)

p (56-56)

p (56-56)

p (58-58)

p (131-131)

p (163-163)

p (187-187)

p (198-198)

mllm/compile/ir/linalg/Op.hpp (4)

mllm/core/aops/CmpOp.cpp (1)

EqualOp (36-36)

mllm/core/aops/CmpOp.hpp (1)

EqualOp (15-15)

mllm/core/aops/WhereOp.cpp (1)

WhereOp (12-12)

mllm/core/aops/WhereOp.hpp (1)

WhereOp (15-15)

mllm/backends/qnn/aot/QnnTargetMachineParser.cpp (3)

pymllm/ffi/__init__.py (25)

NONE (341-342)

V68 (345-346)

V69 (349-350)

V73 (353-354)

V75 (357-358)

V79 (361-362)

V81 (365-366)

UNKNOWN_SM (374-375)

SA8295 (378-379)

SM8350 (382-383)

SM8450 (386-387)

SM8475 (390-391)

SM8550 (394-395)

SM8650 (398-399)

SM8750 (402-403)

SM8850 (406-407)

SSG2115P (410-411)

SSG2125P (414-415)

SXR1230P (418-419)

SXR2230P (422-423)

SXR2330P (426-427)

QCS9100 (430-431)

SAR2230P (434-435)

SA8255 (438-439)

SW6100 (442-443)

mllm/engine/ConfigFile.hpp (1)

json_str (39-39)

mllm/backends/qnn/aot/QnnTargetMachineParser.hpp (1)

parseQcomTargetMachineFromJSON (23-23)

mllm/compile/ir/linalg/Op.cpp (4)

mllm/core/aops/CmpOp.cpp (1)

EqualOp (36-36)

mllm/core/aops/CmpOp.hpp (1)

EqualOp (15-15)

mllm/core/aops/WhereOp.cpp (1)

WhereOp (12-12)

mllm/core/aops/WhereOp.hpp (1)

WhereOp (15-15)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (4)

mllm/compile/ir/linalg/Op.hpp (1)

ctx (155-155)

mllm/compile/ir/Node.cpp (3)

IRWriter (290-291)

name (202-202)

name (202-202)

mllm/compile/ir/Node.hpp (6)

IRWriter (354-354)

IRWriter (355-355)

op (468-468)

op (470-470)

name (302-302)

name (306-306)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (12)

QnnAOTParamScalar (51-51)

QnnAOTParamScalar (53-53)

QnnAOTParamScalar (55-55)

name (236-236)

name (238-238)

name (240-240)

v (102-102)

v (102-102)

param (133-133)

param (137-137)

QnnAOTEnv (232-232)

QnnAOTEnv (234-234)

mllm/core/aops/CmpOp.cpp (2)

mllm/core/aops/CmpOp.hpp (7)

EqualOp (15-15)

ploader (17-17)

trace_context (19-19)

inputs (21-21)

inputs (23-23)

inputs (25-25)

options_ (27-27)

mllm/backends/cpu/ops/CmpOp.hpp (3)

options (20-22)

options (20-20)

inputs (15-15)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)

mllm/backends/qnn/QNNUtils.hpp (2)

name (191-192)

name (195-196)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

QnnAOTParamTensor (91-112)

QnnAOTParamTensor (114-117)

🪛 Clang (14.0.6)

mllm/core/Tensor.hpp

[error] 420-420: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 426-426: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/core/aops/CmpOp.hpp

[error] 27-27: function 'options' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)

[error] 30-30: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp

[error] 11-11: class 'OpNamingPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/passes/AOTCompileContext.hpp

[error] 11-11: constructor does not initialize these fields: config_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 11-11: class 'AOTCompileContext' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 28-28: parameter name 'fp' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 34-34: constructor does not initialize these fields: config_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

examples/qwen3_qnn_aot/compile.cpp

[error] 15-15: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 15-15: invalid case style for variable 'MLLM_MAIN'

(readability-identifier-naming,-warnings-as-errors)

mllm/core/aops/WhereOp.hpp

[error] 30-30: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

[error] 17-17: 2 adjacent parameters of 'makeRoPEInvFreq' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 24-24: 2 adjacent parameters of 'makeRotaryPosEmbedding' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 73-73: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 81-81: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 81-81: 2 adjacent parameters of 'Qwen3MLP' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 88-88: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 89-89: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 91-91: variable name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 98-98: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, rms_norm_q_, rms_norm_k_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, head_dim_, num_attention_heads_, num_key_value_heads_, num_key_value_groups_, scale_, layer_idx_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 120-120: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, rms_norm_q_, rms_norm_k_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, head_dim_, num_attention_heads_, num_key_value_heads_, layer_idx_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 120-120: 2 adjacent parameters of 'Qwen3Attention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 125-125: 'num_key_value_groups_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 126-126: 'scale_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 126-126: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 145-145: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 146-146: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 179-179: variable name 'kh' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 180-180: variable name 'vh' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 192-192: 20 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 195-195: variable name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 202-202: member variable 'layer_idx_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 205-205: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 214-214: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 214-214: 2 adjacent parameters of 'Qwen3Decoder' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 221-221: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 228-228: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 229-229: variable name '_' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 239-239: constructor does not initialize these fields: decode_blocks_, norm_, embedding_, rope_sin_, rope_cos_, num_hidden_layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 250-250: constructor does not initialize these fields: decode_blocks_, norm_, embedding_, rope_sin_, rope_cos_, num_hidden_layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 250-250: 2 adjacent parameters of 'Qwen3Text' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 260-260: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 264-264: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 297-297: constructor does not initialize these fields: lm_head_, tie_word_embeddings_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 311-311: 2 adjacent parameters of 'trace' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 313-313: variable 'llm_ir' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 350-350: variable 'position_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 389-389: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Elewise.hpp

[error] 15-15: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 17-17: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/nn/Functional.cpp

[error] 203-203: 3 adjacent parameters of 'where' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/passes/OpNamingPass.cpp

[error] 21-21: 2 adjacent parameters of 'visitSubGraph' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 49-49: 2 adjacent parameters of 'visitCallGraph' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Base.hpp

[error] 17-17: method 'isMatch' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 17-17: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 19-19: method 'rewrite' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 21-21: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp

[error] 223-223: function-like macro 'RTTI_RK_OP_LINALGIROP_EQUALOP_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 226-226: function-like macro 'RTTI_RK_OP_LINALGIROP_WHEREOP_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp

[error] 11-11: class 'MarkQnnGraphPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp

[error] 130-130: function-like macro 'MLLM_FP16_TO_FP32' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 133-133: function-like macro 'MLLM_FP32_TO_FP16' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/compile/ir/linalg/Op.hpp

[error] 247-247: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 247-247: class 'EqualOp' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 248-248: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 248-248: class 'WhereOp' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/QnnTargetMachineParser.cpp

[error] 75-75: variable name 'tm' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 92-92: variable 'buffer' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp

[error] 24-24: 2 adjacent parameters of 'recursiveVisitGraph' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

[error] 56-56: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/core/aops/CmpOp.cpp

[error] 10-10: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 28-28: declaration uses identifier '__mllmLoggerCallback4QnnLogger', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

[error] 64-64: class 'QnnAOTParamTensor' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 68-68: 4 adjacent parameters of 'create' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 75-75: 4 adjacent parameters of 'create' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 104-104: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 119-119: constructor does not initialize these fields: op_name_, param_scalar, param_tensor, inputs, outputs

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 152-152: constructor does not initialize these fields: op_node_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-macos
GitHub Check: build-x86
GitHub Check: build-android

…r tensor

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
114-118: Previous issue not fully resolved: destructor still crashes if alloc() never called.

The assertion on line 116 will still fail if data is nullptr (when alloc() was never called), which was the issue flagged in the previous review. The if (data) check on line 117 is redundant because it comes after the assertion.
🔎 Proposed fix
 QnnAOTParamTensor::~QnnAOTParamTensor() {
   auto data = QNN_TENSOR_GET_CLIENT_BUF(qnn_param_.tensorParam).data;
-  MLLM_RT_ASSERT(data != nullptr);
-  if (data) { free(data); }
+  if (data != nullptr) {
+    free(data);
+  }
 }
Based on past review comments, the destructor should gracefully handle unallocated tensors.

140-154: Previous issue not resolved: TODO stubs still return without values for non-void functions.

These methods have non-void return types but contain no implementation, which causes undefined behavior if they are called. The issue flagged in the previous review remains unaddressed.
🔎 Proposed fix - add explicit error handling
 Qnn_TensorType_t QnnAOTNodeTensor::parseQnnTensorTypeFromIR() {
-  // TODO
+  // TODO: Implement tensor type parsing from IR
+  MLLM_ERROR("parseQnnTensorTypeFromIR not implemented");
+  MLLM_RT_ASSERT(false && "parseQnnTensorTypeFromIR not implemented");
+  return QNN_TENSOR_TYPE_UNDEFINED;
 }

 Qnn_DataType_t QnnAOTNodeTensor::parseQnnDataTypeFromIR() {
-  // TODO
+  // TODO: Implement data type parsing from IR
+  MLLM_ERROR("parseQnnDataTypeFromIR not implemented");
+  MLLM_RT_ASSERT(false && "parseQnnDataTypeFromIR not implemented");
+  return QNN_DATATYPE_UNDEFINED;
 }

 std::string QnnAOTNodeTensor::parseQnnTensorNameFromIR() {
-  // TODO
+  // TODO: Implement tensor name parsing from IR
+  MLLM_ERROR("parseQnnTensorNameFromIR not implemented");
+  MLLM_RT_ASSERT(false && "parseQnnTensorNameFromIR not implemented");
+  return "";
 }

 Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR() {
-  // TODO
+  // TODO: Implement quantize param parsing from IR
+  MLLM_ERROR("parseQnnQuantizeParamFromIR not implemented");
+  MLLM_RT_ASSERT(false && "parseQnnQuantizeParamFromIR not implemented");
+  return Qnn_QuantizeParams_t{};
 }
Based on past review comments, these stubs should have explicit error handling to avoid undefined behavior.

🧹 Nitpick comments (2)

mllm/compile/ir/linalg/Op.cpp (1)
119-119: Verify no trailing whitespace on empty line.

Empty lines should not contain trailing whitespace per the coding guidelines.

Run the following script to check for trailing whitespace:
#!/bin/bash
# Check for trailing whitespace on line 119
sed -n '119p' mllm/compile/ir/linalg/Op.cpp | grep -E '\s+$' && echo "Trailing whitespace found on line 119" || echo "No trailing whitespace on line 119"
As per coding guidelines, rule 4 states: "No line may end with trailing whitespace."
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (1)
43-43: Consider passing strings by const reference.

The range-based for loops copy each string by value. For better efficiency, especially if the configuration lists are large, consider using const auto& or const std::string&.
🔎 Suggested optimization

Line 43:
-  for (std::string item : config["graph_on_qnn"]) {
+  for (const auto& item : config["graph_on_qnn"]) {
Line 50:
-  for (std::string item : config["op_on_qnn"]) { op_visited.insert({item, false}); }
+  for (const auto& item : config["op_on_qnn"]) { op_visited.insert({item, false}); }
Also applies to: 50-50

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3dbe563 and 47b3bd2.

📒 Files selected for processing (4)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/compile/ir/linalg/Op.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/compile/ir/linalg/Op.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/compile/ir/linalg/Op.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/compile/ir/linalg/Op.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
mllm/compile/ir/linalg/Op.cpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

🧬 Code graph analysis (4)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (4)

mllm/compile/ir/linalg/Op.hpp (1)

ctx (155-155)

mllm/compile/ir/Node.hpp (6)

IRWriter (354-354)

IRWriter (355-355)

op (468-468)

op (470-470)

name (302-302)

name (306-306)

mllm/compile/ir/Node.cpp (3)

IRWriter (290-291)

name (202-202)

name (202-202)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

op (17-17)

mllm/compile/ir/linalg/Op.cpp (4)

mllm/core/aops/CmpOp.cpp (1)

EqualOp (36-36)

mllm/core/aops/CmpOp.hpp (1)

EqualOp (15-15)

mllm/core/aops/WhereOp.hpp (1)

WhereOp (15-15)

mllm/core/aops/WhereOp.cpp (1)

WhereOp (12-12)

mllm/backends/qnn/aot/visitor/Elewise.cpp (4)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/visitor/Base.hpp (3)

op (17-17)

op (17-17)

g_name (21-23)

mllm/backends/qnn/aot/visitor/Elewise.hpp (2)

op (15-15)

g_name (17-18)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (28)

QnnAOTParamScalar (51-51)

QnnAOTParamScalar (53-53)

QnnAOTParamScalar (55-55)

name (236-236)

name (238-238)

name (240-240)

QnnAOTParamTensor (80-81)

QnnAOTParamTensor (83-83)

param_name (68-73)

param_name (68-69)

param_name (75-78)

param_name (75-76)

QnnAOTDataTypeSize (30-30)

QnnAOTNodeTensor (104-104)

v (102-102)

v (102-102)

ins (123-123)

ous (125-125)

input (127-127)

output (129-129)

params (131-131)

params (135-135)

param (133-133)

param (137-137)

op_name (139-139)

package_name (141-141)

QnnAOTEnv (232-232)

QnnAOTEnv (234-234)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp

[error] 24-24: 2 adjacent parameters of 'recursiveVisitGraph' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

[error] 56-56: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-x86
GitHub Check: build-macos
GitHub Check: build-android

🔇 Additional comments (11)

mllm/compile/ir/linalg/Op.cpp (1)

117-118: LGTM! Operator registrations are correct.

The mappings correctly associate OpTypes::kEqual with EqualOp and OpTypes::kWhere with WhereOp, matching their respective constructor implementations. The previous issue where WhereOp was incorrectly mapped to OpTypes::kArgsort has been resolved.

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (2)

24-24: Static analysis warning is a false positive.

The Clang warning about "easily swappable parameters" is incorrect. The two parameters have distinct types: const ir::IRContext::ptr_t& (shared pointer to IRContext) and const ir::graph::GraphIROp::ptr_t& (shared pointer to GraphIROp), so they cannot be accidentally swapped.

58-60: Past issue has been correctly addressed.

The op_visited map is now properly updated at line 60 with op_visited[name] = true;, ensuring that each matching op name is marked only on its first occurrence. This resolves the concern raised in the previous review.

mllm/backends/qnn/aot/visitor/Elewise.cpp (3)

10-12: LGTM!

The isMatch implementation correctly identifies AddOp nodes with the "using_qnn" attribute for QNN AOT pattern matching.

17-21: LGTM!

The cast to linalg::AddOp includes appropriate error handling and logging.

23-27: LGTM!

The error message has been corrected to accurately reference aops::AddOp instead of the previous incorrect message. The cast logic and error handling are appropriate.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)

65-89: LGTM!

The scalar parameter constructors correctly initialize the QNN parameter structure for each type (bool, uint32_t, float), and the name lifetime is properly managed via the member variable.

91-112: LGTM!

The tensor parameter constructor properly initializes all QNN tensor fields and correctly manages string/vector member lifetimes.

157-205: LGTM!

The builder pattern methods correctly implement the fluent interface, allowing chainable operation configuration. The distinction between add* (for vectors) and emplace* (for single items) is clear and consistent.

251-255: LGTM!

The constructor signature update to accept const QcomTargetMachine& is appropriate and correctly applied to both overloads.

378-390: LGTM!

The conditional allocation of the context config array with null-termination is correctly implemented. The size check prevents unnecessary allocation, and memory is properly tracked via unreachable_handle_.

coderabbitai · 2025-12-23T06:53:29Z

+    default:
+      MLLM_ERROR("QnnAOTDataTypeSize: unsupported Qnn_DataType_t {}", static_cast<int>(dtype));
+      MLLM_RT_ASSERT(false);
+      return 0;
+  }


⚠️ Potential issue | 🟠 Major

Guard against assertion-disabled builds in error path.

If assertions are compiled out, the function returns 0 for unsupported types, which would cause incorrect memory calculations (potentially leading to zero-sized allocations or buffer underflows). Consider using a fatal error mechanism that cannot be disabled, or throw an exception.

🔎 Proposed fix

default: MLLM_ERROR("QnnAOTDataTypeSize: unsupported Qnn_DataType_t {}", static_cast<int>(dtype)); - MLLM_RT_ASSERT(false); - return 0; + std::abort(); // or throw std::runtime_error("Unsupported Qnn_DataType_t"); }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 58-62, the default/error path currently relies on MLLM_RT_ASSERT(false) followed by returning 0 which can silently return 0 when assertions are disabled; replace that with an unconditional fatal behavior that cannot be compiled out — e.g., throw a std::runtime_error with a descriptive message that includes the unsupported dtype (static_cast<int>(dtype)), or if exceptions are not allowed in this module call a guaranteed-terminate function (abort/std::terminate or your project’s MLLM_FATAL equivalent) with the same descriptive message; ensure the function does not return a value after that call so callers cannot receive 0 for an unsupported type.

coderabbitai · 2025-12-23T06:53:29Z

+void* QnnAOTParamTensor::alloc() {
+  uint32_t data_size = QnnAOTDataTypeSize(QNN_TENSOR_GET_DATA_TYPE(qnn_param_.tensorParam));
+  for (int i = 0; i < QNN_TENSOR_GET_RANK(qnn_param_.tensorParam); i++) {
+    data_size *= qnn_param_.tensorParam.v2.dimensions[i];
+  }
+  Qnn_ClientBuffer_t clientBuffer = {.data = malloc(data_size), .dataSize = data_size};
+  QNN_TENSOR_SET_CLIENT_BUF(qnn_param_.tensorParam, clientBuffer);
+  MLLM_RT_ASSERT(QNN_TENSOR_GET_CLIENT_BUF(qnn_param_.tensorParam).data != nullptr);
+  return QNN_TENSOR_GET_CLIENT_BUF(qnn_param_.tensorParam).data;
+}


⚠️ Potential issue | 🟠 Major

Add overflow protection for size calculation.

The data_size calculation multiplies dimensions without overflow checking. With sufficiently large dimension values, this could overflow and result in a small allocation, leading to buffer overflows when the tensor is populated.

🔎 Proposed fix using checked multiplication

void* QnnAOTParamTensor::alloc() { - uint32_t data_size = QnnAOTDataTypeSize(QNN_TENSOR_GET_DATA_TYPE(qnn_param_.tensorParam)); + size_t data_size = QnnAOTDataTypeSize(QNN_TENSOR_GET_DATA_TYPE(qnn_param_.tensorParam)); for (int i = 0; i < QNN_TENSOR_GET_RANK(qnn_param_.tensorParam); i++) { - data_size *= qnn_param_.tensorParam.v2.dimensions[i]; + size_t dimension = qnn_param_.tensorParam.v2.dimensions[i]; + if (data_size > 0 && dimension > SIZE_MAX / data_size) { + MLLM_ERROR("QnnAOTParamTensor::alloc: size overflow detected"); + MLLM_RT_ASSERT(false); + return nullptr; + } + data_size *= dimension; } Qnn_ClientBuffer_t clientBuffer = {.data = malloc(data_size), .dataSize = data_size};

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 120 to 129, the data_size calculation multiplies element size by each dimension without overflow checks; change data_size to size_t, compute iteratively and before each multiplication verify that data_size <= SIZE_MAX / dimension (and that dimension > 0), and if the check fails handle it by logging/erroring/returning nullptr or asserting instead of proceeding to malloc; also verify QnnAOTDataTypeSize result fits into size_t before starting and ensure malloc is checked for null before using the buffer.

coderabbitai · 2025-12-23T06:53:29Z

+  // TODO
+
+  return true;


⚠️ Potential issue | 🟡 Minor

TODO comment format and incomplete implementation.

The TODO comment on line 29 doesn't follow the coding guidelines format. Per the guidelines, TODO comments must be written as TODO: followed by a description.

Additionally, the function currently performs validation but doesn't implement the actual node addition logic. The parameters g_name, inputs, and outputs remain unused, and the function returns true despite not adding any QNN node to the graph.

🔎 Suggested TODO format

- // TODO + // TODO: Implement QNN graph node addition for AddOp

Would you like me to help implement the QNN node addition logic for this pattern, or open a new issue to track this task?

Based on coding guidelines, TODO comments must follow the format 'TODO:' followed by UTF-8 text.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// TODO

return true;

// TODO: Implement QNN graph node addition for AddOp

return true;

chenghuaWang and others added 10 commits December 19, 2025 10:47

feat(qnn): add Qwen3 AOT support with configuration and parser implem…

e4cd829

…entation

Merge branch 'UbiquitousLearning:main' into wch-main

eadf3d1

feat(qnn): update QnnAOTEnv constructor to accept const reference and…

550b207

… enhance CMake configuration

feat(qnn): implement AOT pipeline and add visitor classes for QNN ope…

6b7bba5

…rations

feat(qnn): enhance AOT pipeline with configuration management and con…

1c07d32

…text handling

feat(qnn): add AOT parameter classes and visitor patterns for quantiz…

062b7b0

…ation and tensor renaming

feat(qnn): add OpNamingPass for unique operation naming and update qu…

064305c

…antization configuration

refactor(qnn): reorder include statements for better organization

c32a14c

chenghuaWang requested review from liang1232018, oreomaker and yirongjie as code owners December 23, 2025 06:06

UbiquitousLearning approved these changes Dec 23, 2025

View reviewed changes

coderabbitai Bot reviewed Dec 23, 2025

View reviewed changes

fix(qnn): improve memory management and error logging in AOT paramete…

47b3bd2

…r tensor

coderabbitai Bot reviewed Dec 23, 2025

View reviewed changes

chenghuaWang merged commit b13908e into UbiquitousLearning:main Dec 23, 2025
4 checks passed

This was referenced Dec 23, 2025

feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline #569

Merged

feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization #572

Merged

feat: add LLM2QnnLoweringPass and update graph splitting logic #577

Merged

This was referenced Jan 5, 2026

fix: Qualcomm QNN AOT Pass #579

Merged

feat(qualcomm): Qnn AOT Lowering pass #580

Merged

fix: LPBQ return shape fellow qnn spec #595

Merged

This was referenced Jan 14, 2026

feat(qualcomm): Qnn aot runner #603

Merged

feat(qualcomm): Qnn AOT Runtime #606

Merged

fix(qualcomm): Enhance quantization modules. #607

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(Qnn AOT): AOT and AOT Runtime. Qwen3 AOT Mode.#567

feat(Qnn AOT): AOT and AOT Runtime. Qwen3 AOT Mode.#567
chenghuaWang merged 11 commits intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main

chenghuaWang commented Dec 23, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 23, 2025 •

edited

Loading

Uh oh!

UbiquitousLearning left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

coderabbitai Bot Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chenghuaWang commented Dec 23, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

UbiquitousLearning left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghuaWang commented Dec 23, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 23, 2025 •

edited

Loading