Skip to content

feat(Qnn AOT): AOT and AOT Runtime. Qwen3 AOT Mode.#567

Merged
chenghuaWang merged 11 commits intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main
Dec 23, 2025
Merged

feat(Qnn AOT): AOT and AOT Runtime. Qwen3 AOT Mode.#567
chenghuaWang merged 11 commits intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main

Conversation

@chenghuaWang
Copy link
Copy Markdown
Collaborator

@chenghuaWang chenghuaWang commented Dec 23, 2025

  • Introduced new data types for per-tensor and per-channel quantization, including int8 and int16 types.
  • Updated the LinearOp class to handle new linear implementation types: kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64.
  • Enhanced the DataTypes utility functions to accommodate the new quantization types.
  • Modified Tensor operations to support addition and multiplication with int16 per-tensor symmetric types.
  • Updated the Module class to handle vector arguments for tensors and AnyValue types.

Summary by CodeRabbit

  • New Features

    • QNN AOT pipeline and tooling added for Qwen3 AOT compilation, config files, and example CLI
    • New Equal and Where tensor ops with public APIs and nn.functional.where
  • Improvements

    • Expanded quantization types (int8/int16 per-tensor/per-channel) and related dtype support
    • CPU backend: added comparison/where implementations and op registration
    • IR/pass utilities to mark QNN graphs and name ops for deterministic compilation

✏️ Tip: You can customize this high-level summary in your review settings.

chenghuaWang and others added 10 commits December 19, 2025 10:47
- Implemented CPUEqualOp and CPUWhereOp classes for element-wise comparison and conditional selection.
- Added corresponding factory classes for both operations.
- Integrated Equal and Where operations into the CPUBackend.
- Updated Tensor class to include methods for equal comparisons with both Tensor and float inputs.
- Enhanced the aops layer with EqualOp and WhereOp definitions, including forward and reshape methods.
- Registered new operations in the IR and updated RTTI generation scripts.
- Modified Functional API to expose where operation for user convenience.
- Adjusted build configuration to use RelWithDebInfo for better debugging support.
…ementations

- Introduced new data types for per-tensor and per-channel quantization, including int8 and int16 types.
- Updated the LinearOp class to handle new linear implementation types: kQNN_LPBQ_w4a16o16_G32 and kQNN_LPBQ_w4a16o16_G64.
- Enhanced the DataTypes utility functions to accommodate the new quantization types.
- Modified Tensor operations to support addition and multiplication with int16 per-tensor symmetric types.
- Updated the Module class to handle vector arguments for tensors and AnyValue types.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 23, 2025

Walkthrough

Adds Qwen3 QNN AOT example and model implementation, introduces QNN AOT backend infrastructure (parsing, env/wrappers, AOT passes, pipeline, visitors), registers CPU Equal/Where ops with CPU factories and Tensor::equal API, extends quantization data types and LinearImpl variants, and updates build/config files.

Changes

Cohort / File(s) Summary
Qwen3 QNN AOT Example
examples/CMakeLists.txt, examples/qwen3_qnn_aot/CMakeLists.txt, examples/qwen3_qnn_aot/compile.cpp, examples/qwen3_qnn_aot/config_1.7B.json, examples/qwen3_qnn_aot/qnn_aot_cfg.json, examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
Adds a new qwen3 QNN AOT example target and executable, model config and AOT config JSONs, and a Qwen3 model implementation with RoPE, attention, MLP, decoder, KV-cache and trace flow; compile.cpp orchestrates model load, trace, QNN AOT env, and lowering pipeline. Also comments out two example subdirs in examples/CMakeLists.txt.
QNN AOT Backend: core infra & API
mllm/backends/qnn/aot/QnnWrappersAPI.hpp, mllm/backends/qnn/aot/QnnWrappersAPI.cpp, mllm/backends/qnn/aot/QnnTargetMachineParser.hpp, mllm/backends/qnn/aot/QnnTargetMachineParser.cpp, mllm/backends/qnn/QNNUtils.hpp, mllm/backends/qnn/Register.cpp
Adds QnnAOT param/tensor/node/operation/graph modeling types, QnnAOTDataTypeSize helper, const-correct QnnAOTEnv constructors, dynamic alloc/free for AOT tensors, and a Qcom target-machine JSON parser; minor include/format tweaks.
QNN AOT Backend: passes & visitors
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp, mllm/backends/qnn/aot/passes/AOTCompileContext.cpp, mllm/backends/qnn/aot/passes/AOTPipeline.hpp, mllm/backends/qnn/aot/passes/AOTPipeline.cpp, mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp, mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp, mllm/backends/qnn/aot/passes/OpNamingPass.hpp, mllm/backends/qnn/aot/passes/OpNamingPass.cpp, mllm/backends/qnn/aot/visitor/Base.hpp, mllm/backends/qnn/aot/visitor/Elewise.hpp, mllm/backends/qnn/aot/visitor/Elewise.cpp
Introduces AOTCompileContext singleton for config/env, pipeline factory for QNN AOT lowering, passes to mark graphs using QNN and to assign deterministic op names, base pattern class and an Add pattern visitor (skeleton).
CPU backend: ops & registration
mllm/backends/cpu/CPUBackend.cpp, mllm/backends/cpu/ops/CmpOp.hpp, mllm/backends/cpu/ops/CmpOp.cpp, mllm/backends/cpu/ops/WhereOp.hpp, mllm/backends/cpu/ops/WhereOp.cpp
Adds CPUEqualOp and CPUWhereOp classes and factories, registers them in CPUBackend, implements CPUEqualOp forward/ broadcasting for several dtypes, and adds a WhereOp stub (unimplemented forward).
Core IR & op kinds
mllm/compile/ir/GeneratedRTTIKind.hpp, mllm/compile/ir/NodeRTTIClassOfImpl.hpp, mllm/compile/ir/linalg/Op.hpp, mllm/compile/ir/linalg/Op.cpp, mllm/compile/ir/rtti_kind_gen.py
Adds EqualOp and WhereOp kinds to IR/RTTI generation and registers them in linalg op declarations; updates dump formatting adjustments.
Core aops: Equal/Where & Tensor API
mllm/core/aops/CmpOp.hpp, mllm/core/aops/CmpOp.cpp, mllm/core/aops/WhereOp.hpp, mllm/core/aops/WhereOp.cpp, mllm/core/Tensor.hpp, mllm/core/Tensor.cpp, mllm/nn/Functional.hpp, mllm/nn/Functional.cpp
Adds EqualOp and WhereOp high-level aops with trace/reshape stubs, broadcastShapes helper, implements Tensor::equal(Tensor)/equal(float), and nn::functional::where wrapper.
Data types & Linear impls
mllm/core/DataTypes.hpp, mllm/core/DataTypes.cpp, mllm/core/aops/LinearOp.hpp, mllm/backends/cpu/ops/LinearOp.cpp
Introduces many per-tensor/per-channel quantized typedefs and DataTypes enum entries (int8/uint8/int16/uint16 symmetric/asymmetric), registers them with type-info macros, and adds new LinearImplTypes kQNN_LPBQ_w4a16o16_G32/G64 with output dtype adjustments.
IR/node utilities & minor changes
mllm/compile/ir/Node.hpp, mllm/compile/ir/Node.cpp, mllm/compile/ir/tensor/Op.cpp, mllm/compile/ir/graph/Op.cpp, mllm/compile/ir/NodeRTTIClassOfImpl.hpp
Adds Node::attrNum(), propagates device in RegisterOp::build, adjusts Op dump formatting and adds RTTI macros for Equal/Where.
Build/config changes
examples/CMakeLists.txt, tasks/build_x86_qnn_aot.yaml
Disables two minicpm subdirs in examples CMake, changes build type in CI task from Release to RelWithDebInfo.
Misc: guards/fixes
mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
Adds include guards around FP16/FP32 conversion macros to avoid redefinition.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as compile.cpp
    participant QwenModel as Qwen3ForCausalLM
    participant Trace as IR Trace
    participant QnnEnv as QnnAOTEnv
    participant PassMgr as PassManager
    participant Passes as MarkQnnGraphPass / OpNamingPass
    participant Output as MIR Writer

    User->>CLI: run with model/config/AOT-config
    CLI->>CLI: parse args, validate AOT config
    CLI->>QwenModel: load model, params, gen RoPE, setup KV-cache
    CLI->>Trace: call model.trace()
    Trace-->>CLI: IR graph
    CLI->>QnnEnv: init from target machine JSON
    CLI->>PassMgr: create with createQnnAOTLoweringPipeline(env, cfg)
    PassMgr->>Passes: run MarkQnnGraphPass (mark using_qnn)
    PassMgr->>Passes: run OpNamingPass (assign deterministic names)
    PassMgr-->>CLI: transformed IR
    CLI->>Output: write final MIR
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • PR #562: QNN AOT backend infrastructure — overlaps QnnWrappersAPI, AOT passes/pipeline and related AOT types.
  • PR #563: Qcom target-machine and QNN AOT enums — closely related to QnnTargetMachine parsing/usage changes here.
  • PR #553: CPU backend op registrations and factories — similar changes to CPUBackend registration patterns and op factory additions.

Suggested reviewers

  • liang1232018
  • yirongjie
  • oreomaker

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.21% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main feature being added: QNN AOT (Ahead-of-Time) compilation support and Qwen3 AOT mode implementation.
Description check ✅ Passed The PR description provides a bulleted list of key changes including data types, LinearOp updates, DataTypes utilities, Tensor operations, and Module class enhancements, addressing the main objectives.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Owner

@UbiquitousLearning UbiquitousLearning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (24)
mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp (1)

129-135: LGTM! Include guards improve safety.

The addition of #ifndef guards around the macro definitions prevents potential redefinition errors and is a good defensive programming practice. The #endif at line 135 correctly closes the #else block from line 106.

Optional: Consider consistency across all branches

The ARM_NEON (lines 76-77) and MSC_VER (lines 103-104) branches define the same macros without include guards. If macro redefinition is a concern for the #else branch, consider whether the other branches should also be guarded for consistency:

// ARM_NEON branch
#ifndef MLLM_FP16_TO_FP32
#define MLLM_FP16_TO_FP32(x) ((float)(x))
#endif
#ifndef MLLM_FP32_TO_FP16
#define MLLM_FP32_TO_FP16(x) ((mllm_fp16_t)x)
#endif

// Similar for MSC_VER branch

Alternatively, if guards are only needed in the #else branch due to specific platform requirements, this is fine as-is.

mllm/nn/Module.hpp (1)

155-160: Consider preserving lvalue vector arguments.

The current implementation unconditionally moves from vector elements, which will leave lvalue arguments in a moved-from state. When a user passes a vector by lvalue reference (e.g., std::vector<Tensor> v; module(v, ...)), they might not expect v's elements to be consumed.

Consider using conditional forwarding based on value category to copy from lvalues and move from rvalues, or document that this function consumes vector arguments.

🔎 Proposed refinement using conditional move/copy

One approach is to check if the argument is an rvalue reference and conditionally move:

 } else if constexpr (std::is_same_v<CleanType, std::vector<Tensor>>) {
   tensors.reserve(tensors.size() + args.size());
-  for (auto& tensor : args) { tensors.push_back(std::move(tensor)); }
+  if constexpr (std::is_rvalue_reference_v<decltype(args)>) {
+    for (auto& tensor : args) { tensors.push_back(std::move(tensor)); }
+  } else {
+    for (const auto& tensor : args) { tensors.push_back(tensor); }
+  }
 } else if constexpr (std::is_same_v<CleanType, std::vector<AnyValue>>) {
   others.reserve(others.size() + args.size());
-  for (auto& value : args) { others.push_back(std::move(value)); }
+  if constexpr (std::is_rvalue_reference_v<decltype(args)>) {
+    for (auto& value : args) { others.push_back(std::move(value)); }
+  } else {
+    for (const auto& value : args) { others.push_back(value); }
+  }
 }

Alternatively, document that the function consumes vector arguments and update the function documentation to clarify move semantics.

mllm/backends/qnn/aot/QnnTargetMachineParser.cpp (2)

14-67: Consider using maps for improved scalability.

The current if-chain approach works but could be refactored to use std::unordered_map<std::string, EnumType> for O(1) lookups instead of O(n). This would improve maintainability and scalability, especially for parseChipset with 18 variants.

🔎 Example refactor for parseHTPArch
 QcomHTPArch parseHTPArch(const std::string& arch_str) {
-  if (arch_str == "NONE") return QcomHTPArch::NONE;
-  if (arch_str == "V68") return QcomHTPArch::V68;
-  if (arch_str == "V69") return QcomHTPArch::V69;
-  if (arch_str == "V73") return QcomHTPArch::V73;
-  if (arch_str == "V75") return QcomHTPArch::V75;
-  if (arch_str == "V79") return QcomHTPArch::V79;
-  if (arch_str == "V81") return QcomHTPArch::V81;
-  throw std::invalid_argument("Unknown HTP architecture: " + arch_str);
+  static const std::unordered_map<std::string, QcomHTPArch> arch_map = {
+    {"NONE", QcomHTPArch::NONE}, {"V68", QcomHTPArch::V68}, {"V69", QcomHTPArch::V69},
+    {"V73", QcomHTPArch::V73}, {"V75", QcomHTPArch::V75}, {"V79", QcomHTPArch::V79},
+    {"V81", QcomHTPArch::V81}
+  };
+  auto it = arch_map.find(arch_str);
+  if (it != arch_map.end()) return it->second;
+  throw std::invalid_argument("Unknown HTP architecture: " + arch_str);
 }

69-86: Improve error messages with field existence checks.

While the try-catch handles missing fields, it doesn't provide specific information about which field is missing. Adding explicit checks would improve debuggability.

🔎 Proposed improvement
 QcomTargetMachine parseQcomTargetMachineFromJSON(const std::string& json_str) {
   try {
     auto json = nlohmann::json::parse(json_str);
 
     if (!json.contains("target_machine")) { throw std::invalid_argument("JSON must contain 'target_machine' field"); }
 
     auto tm = json["target_machine"];
+
+    // Validate all required fields are present
+    const std::vector<std::string> required_fields = {
+      "htp_arch", "htp_chipset", "htp_try_best_performance", 
+      "htp_security_pd_session", "htp_vtcm_capability_in_mb"
+    };
+    for (const auto& field : required_fields) {
+      if (!tm.contains(field)) {
+        throw std::invalid_argument("target_machine missing required field: " + field);
+      }
+    }
 
     QcomTargetMachine result;
     result.soc_htp_arch = parseHTPArch(tm["htp_arch"].get<std::string>());
     result.soc_htp_chipset = parseChipset(tm["htp_chipset"].get<std::string>());
     result.soc_htp_performance = parsePerformance(tm["htp_try_best_performance"].get<std::string>());
     result.soc_htp_security_pd_session = parseSecurityPDSession(tm["htp_security_pd_session"].get<std::string>());
     result.soc_htp_vtcm_total_memory_size = tm["htp_vtcm_capability_in_mb"].get<uint32_t>();
 
     return result;
   } catch (const nlohmann::json::exception& e) { throw std::invalid_argument(std::string("JSON parsing error: ") + e.what()); }
 }
mllm/core/Tensor.hpp (1)

420-426: Rename parameter v to a more descriptive name.

Static analysis flags v as too short (minimum 3 characters required). Consider renaming to other or value for consistency with the project's naming conventions.

🔎 Proposed fix
-  Tensor equal(Tensor v);
+  Tensor equal(Tensor other);

   /**
    * @brief equal
    * @return uint8 mask
    */
-  Tensor equal(float v);
+  Tensor equal(float value);
mllm/backends/qnn/aot/passes/AOTPipeline.hpp (1)

6-9: Add missing <string> include.

The function signature uses std::string but the header is not explicitly included. While it may work via transitive includes, explicit includes improve maintainability.

🔎 Proposed fix
 #include <vector>
+#include <string>

 #include "mllm/backends/qnn/aot/QnnWrappersAPI.hpp"
 #include "mllm/compile/passes/Pass.hpp"
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)

19-21: Consider completing the Rule of 5.

Static analysis flags that copy constructor and assignment are defined without destructor or move operations. For a singleton, explicitly deleting move operations and defaulting the destructor clarifies intent.

🔎 Proposed fix
   // Delete copy constructor and assignment operator
   AOTCompileContext(const AOTCompileContext&) = delete;
   AOTCompileContext& operator=(const AOTCompileContext&) = delete;
+  AOTCompileContext(AOTCompileContext&&) = delete;
+  AOTCompileContext& operator=(AOTCompileContext&&) = delete;
+  ~AOTCompileContext() = default;

28-28: Rename parameter fp to a more descriptive name.

Static analysis flags fp as too short (minimum 3 characters). Consider renaming to file_path or config_path for clarity.

🔎 Proposed fix
-  void setConfig(const std::string& fp);
+  void setConfig(const std::string& config_path);
mllm/backends/cpu/ops/WhereOp.cpp (1)

4-4: Unused include.

The <cstring> header is included but not used in this file.

🔎 Proposed fix
-#include <cstring>
mllm/backends/qnn/aot/visitor/Elewise.cpp (1)

29-31: Incomplete TODO needs description.

The TODO comment lacks a description of what needs to be implemented. Per coding guidelines, TODO comments should follow the format TODO: <description>. Currently addNode validates casts but performs no actual node addition logic.

Would you like me to help outline the expected implementation for adding the QNN node, or should I open an issue to track this task?

🔎 Proposed fix for TODO format
-  // TODO
+  // TODO: Implement QNN graph node addition for AddOp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (2)

7-11: Consider null-checking the env parameter.

The env pointer is passed directly to setEnv() without validation. If a null pointer is passed, it could cause issues downstream when the environment is accessed.

🔎 Proposed fix
 std::vector<std::shared_ptr<ir::Pass>> createQnnAOTLoweringPipeline(QnnAOTEnv* env, const std::string& config_path) {
   std::vector<ir::Pass::ptr_t> ret;
 
+  if (!env) {
+    throw std::invalid_argument("QnnAOTEnv pointer cannot be null");
+  }
+
   AOTCompileContext::getInstance().setEnv(env);
   AOTCompileContext::getInstance().setConfig(config_path);

1-5: Missing license header.

For consistency with other new files in this PR, consider adding the standard license header.

🔎 Proposed fix
+// Copyright (c) MLLM Team.
+// Licensed under the MIT License.
+
 #include "mllm/backends/qnn/aot/passes/AOTPipeline.hpp"
 #include "mllm/backends/qnn/aot/passes/AOTCompileContext.hpp"
examples/qwen3_qnn_aot/CMakeLists.txt (1)

5-8: Consider validating QAIRT_SDK_ROOT environment variable.

If QAIRT_SDK_ROOT is not set, the include paths will be empty, potentially causing cryptic compilation errors. Consider adding validation:

🔎 Proposed fix
+if(NOT DEFINED ENV{QAIRT_SDK_ROOT})
+  message(FATAL_ERROR "QAIRT_SDK_ROOT environment variable is not set. Please set it to your QNN SDK installation path.")
+endif()
+
 target_include_directories(mllm-qwen3-aot-c PRIVATE
   $ENV{QAIRT_SDK_ROOT}/include # QNN SDK include
   $ENV{QAIRT_SDK_ROOT}/include/QNN # QNN SDK include
 )
mllm/backends/cpu/ops/WhereOp.hpp (1)

18-23: Inconsistent final specifier on factory class.

CPUWhereOpFactory is missing the final specifier, while CPUEqualOpFactory in CmpOp.hpp is marked final. For consistency, both factory classes should follow the same pattern.

🔎 Proposed fix
-class CPUWhereOpFactory : public TypedOpFactory<OpTypes::kWhere, aops::WhereOpOptions> {
+class CPUWhereOpFactory final : public TypedOpFactory<OpTypes::kWhere, aops::WhereOpOptions> {
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (1)

74-80: Unused parameter in lambda.

The writer parameter in the lambda is captured but never used. Consider using [[maybe_unused]] or simply omitting the parameter name.

🔎 Proposed fix
   writer.walk<ir::graph::CallGraphOp>(
-      [&](ir::IRWriter& /*writer*/, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {
+      [&](ir::IRWriter& /*unused*/, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {

Or if the signature allows, use auto:

   writer.walk<ir::graph::CallGraphOp>(
-      [&](ir::IRWriter& /*writer*/, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {
+      [&](auto&&, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {
examples/qwen3_qnn_aot/compile.cpp (1)

23-24: Consider making N and CL configurable.

These constants control sequence length and cache length. For flexibility across different model configurations, consider exposing them via CLI arguments or deriving from model config.

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (1)

35-35: Unnecessary trailing semicolon after function definition.

Minor style issue - the semicolon after the closing brace of recursiveVisitGraph is not needed.

🔎 Proposed fix
-      });
-};
+      });
+}
mllm/core/aops/WhereOp.hpp (1)

27-27: Inconsistent options() return type compared to EqualOp.

WhereOp::options() returns a mutable reference WhereOpOptions&, while the related EqualOp::options() in CmpOp.hpp returns const EqualOpOptions&. For consistency and to prevent unintended modifications, consider using a const return type unless mutability is explicitly required.

🔎 Proposed fix for consistency
-  inline WhereOpOptions& options() { return options_; }
+  inline const WhereOpOptions& options() const { return options_; }
mllm/core/DataTypes.hpp (1)

294-296: Non-ASCII characters in code comment.

The comment contains Chinese text ("拼接" meaning "concatenate"). While valid UTF-8, using English in code comments improves accessibility for all contributors.

🔎 Proposed fix
 // These are placeholder types for per-tensor and per-channel quantization.
-// They cannot actually run; at runtime, they require three tensors (weight, scale, zp) to be拼接.
+// They cannot actually run; at runtime, they require three tensors (weight, scale, zp) to be concatenated.
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (3)

191-193: Magic number for attention masking.

The value -20 used for masked attention should be a named constant with documentation explaining why this specific value was chosen. This improves readability and maintainability.

🔎 Proposed fix
+  // Large negative value to effectively zero out masked attention scores after softmax
+  static constexpr float kMaskedAttnPenalty = -20.0f;
+
   // Masked Softmax
   auto attn_min = attn.min(-1, true);
-  float minus_value = -20;
-  attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min + minus_value);
+  attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min + kMaskedAttnPenalty);

202-203: Public member variable layer_idx_ breaks encapsulation.

layer_idx_ is declared public but is only set externally (line 253). Consider making it private with a setter, or initializing it via the constructor.

🔎 Proposed fix
+ public:
+  void setLayerIdx(int idx) { layer_idx_ = idx; }
+
  private:
-  int layer_idx_;
+  int layer_idx_ = 0;
 };

And update the usage in line 253:

for (auto [idx, b] : enumerate(decode_blocks_.list())) { b.self_attn_.setLayerIdx(idx); }

325-343: Duplicate error handling logic for KV caches.

The same pattern for checking and throwing on missing KV cache is repeated for both keys (lines 321-331) and values (lines 333-344). Consider extracting a helper to reduce duplication.

🔎 Proposed fix
+  auto getKVCache = [&](const std::string& name) -> Tensor {
+    if (input.count(name)) {
+      return input.at(name);
+    }
+    throw std::runtime_error("Missing KV cache: " + name);
+  };
+
   // Append Key
   for (int i = 0; i < cfg.num_hidden_layers; ++i) {
-    auto past_key_name = "past_key_" + std::to_string(i);
-    if (input.count(past_key_name)) {
-      kv_caches.push_back(input.at(past_key_name));
-    } else {
-      throw std::runtime_error("Missing KV cache for layer " + std::to_string(i));
-    }
+    kv_caches.push_back(getKVCache("past_key_" + std::to_string(i)));
   }

   // Append Value
   for (int i = 0; i < cfg.num_hidden_layers; ++i) {
-    auto past_value_name = "past_value_" + std::to_string(i);
-    if (input.count(past_value_name)) {
-      kv_caches.push_back(input.at(past_value_name));
-    } else {
-      throw std::runtime_error("Missing KV cache for layer " + std::to_string(i));
-    }
+    kv_caches.push_back(getKVCache("past_value_" + std::to_string(i)));
   }
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

55-56: Consider using sizeof instead of magic number for int64 size.

For clarity and maintainability, use sizeof(int64_t) instead of the literal 8.

🔎 Proposed fix
     case QNN_DATATYPE_INT_64:
-    case QNN_DATATYPE_UINT_64: return 8;
+    case QNN_DATATYPE_UINT_64: return sizeof(int64_t);
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)

28-28: Reserved identifier uses double underscore prefix.

The function __mllmLoggerCallback4QnnLogger uses a double underscore prefix, which is reserved for compiler/standard library implementations. Consider renaming to avoid potential conflicts.

🔎 Proposed fix
-void __mllmLoggerCallback4QnnLogger(const char* fmt, QnnLog_Level_t level, uint64_t times_tamp, va_list argp);
+void mllmLoggerCallback4QnnLogger(const char* fmt, QnnLog_Level_t level, uint64_t times_tamp, va_list argp);

Note: Also update the implementation in QnnWrappersAPI.cpp and any call sites.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f80e4e7 and 3dbe563.

📒 Files selected for processing (74)
  • examples/CMakeLists.txt
  • examples/qwen3_qnn_aot/CMakeLists.txt
  • examples/qwen3_qnn_aot/compile.cpp
  • examples/qwen3_qnn_aot/config_1.7B.json
  • examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
  • examples/qwen3_qnn_aot/qnn_aot_cfg.json
  • examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
  • mllm/backends/cpu/ops/CmpOp.cpp
  • mllm/backends/cpu/ops/CmpOp.hpp
  • mllm/backends/cpu/ops/LinearOp.cpp
  • mllm/backends/cpu/ops/WhereOp.cpp
  • mllm/backends/cpu/ops/WhereOp.hpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/Register.cpp
  • mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
  • mllm/backends/qnn/aot/QnnTargetMachineParser.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
  • mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
  • mllm/backends/qnn/aot/passes/AOTCompileContext.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.hpp
  • mllm/backends/qnn/aot/passes/AutoInsertQuanAndDeQuantPass.cpp
  • mllm/backends/qnn/aot/passes/AutoInsertQuanAndDeQuantPass.hpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.hpp
  • mllm/backends/qnn/aot/passes/TensorRenamingPass.cpp
  • mllm/backends/qnn/aot/passes/TensorRenamingPass.hpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/aot/visitor/Concat.cpp
  • mllm/backends/qnn/aot/visitor/Concat.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/visitor/Equal.cpp
  • mllm/backends/qnn/aot/visitor/Equal.hpp
  • mllm/backends/qnn/aot/visitor/Index.cpp
  • mllm/backends/qnn/aot/visitor/Index.hpp
  • mllm/backends/qnn/aot/visitor/Reduce.cpp
  • mllm/backends/qnn/aot/visitor/Reduce.hpp
  • mllm/backends/qnn/aot/visitor/Repeat.cpp
  • mllm/backends/qnn/aot/visitor/Repeat.hpp
  • mllm/backends/qnn/aot/visitor/Slice.cpp
  • mllm/backends/qnn/aot/visitor/Slice.hpp
  • mllm/backends/qnn/aot/visitor/Transpose.cpp
  • mllm/backends/qnn/aot/visitor/Transpose.hpp
  • mllm/backends/qnn/aot/visitor/Where.cpp
  • mllm/backends/qnn/aot/visitor/Where.hpp
  • mllm/compile/ir/GeneratedRTTIKind.hpp
  • mllm/compile/ir/Node.cpp
  • mllm/compile/ir/Node.hpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
  • mllm/compile/ir/graph/Op.cpp
  • mllm/compile/ir/linalg/Op.cpp
  • mllm/compile/ir/linalg/Op.hpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/compile/ir/tensor/Op.cpp
  • mllm/core/DataTypes.cpp
  • mllm/core/DataTypes.hpp
  • mllm/core/OpTypes.hpp
  • mllm/core/Tensor.cpp
  • mllm/core/Tensor.hpp
  • mllm/core/aops/CmpOp.cpp
  • mllm/core/aops/CmpOp.hpp
  • mllm/core/aops/LinearOp.hpp
  • mllm/core/aops/WhereOp.cpp
  • mllm/core/aops/WhereOp.hpp
  • mllm/nn/Functional.cpp
  • mllm/nn/Functional.hpp
  • mllm/nn/Module.hpp
  • tasks/build_x86_qnn_aot.yaml
💤 Files with no reviewable changes (1)
  • mllm/backends/qnn/Register.cpp
🧰 Additional context used
📓 Path-based instructions (5)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/core/Tensor.hpp
  • mllm/nn/Functional.hpp
  • mllm/compile/ir/GeneratedRTTIKind.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.hpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/core/OpTypes.hpp
  • mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
  • mllm/backends/cpu/ops/CmpOp.hpp
  • mllm/core/aops/CmpOp.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.hpp
  • mllm/backends/qnn/aot/passes/AOTCompileContext.hpp
  • mllm/nn/Module.hpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/core/aops/WhereOp.hpp
  • mllm/core/DataTypes.cpp
  • mllm/core/aops/LinearOp.hpp
  • mllm/backends/cpu/ops/WhereOp.hpp
  • mllm/backends/cpu/ops/CmpOp.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/QnnTargetMachineParser.hpp
  • mllm/backends/cpu/ops/WhereOp.cpp
  • mllm/nn/Functional.cpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/core/aops/WhereOp.cpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
  • mllm/compile/ir/Node.hpp
  • mllm/compile/ir/Node.cpp
  • mllm/core/DataTypes.hpp
  • mllm/compile/ir/graph/Op.cpp
  • mllm/compile/ir/linalg/Op.hpp
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/core/Tensor.cpp
  • mllm/backends/cpu/ops/LinearOp.cpp
  • mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
  • mllm/compile/ir/tensor/Op.cpp
  • mllm/compile/ir/linalg/Op.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/core/aops/CmpOp.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/core/Tensor.hpp
  • mllm/nn/Functional.hpp
  • mllm/compile/ir/GeneratedRTTIKind.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.hpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/core/OpTypes.hpp
  • mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
  • mllm/backends/cpu/ops/CmpOp.hpp
  • mllm/core/aops/CmpOp.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.hpp
  • mllm/backends/qnn/aot/passes/AOTCompileContext.hpp
  • mllm/nn/Module.hpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/core/aops/WhereOp.hpp
  • mllm/core/DataTypes.cpp
  • mllm/core/aops/LinearOp.hpp
  • mllm/backends/cpu/ops/WhereOp.hpp
  • mllm/backends/cpu/ops/CmpOp.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/QnnTargetMachineParser.hpp
  • mllm/backends/cpu/ops/WhereOp.cpp
  • mllm/nn/Functional.cpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/core/aops/WhereOp.cpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
  • mllm/compile/ir/Node.hpp
  • mllm/compile/ir/Node.cpp
  • mllm/core/DataTypes.hpp
  • mllm/compile/ir/graph/Op.cpp
  • mllm/compile/ir/linalg/Op.hpp
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/core/Tensor.cpp
  • mllm/backends/cpu/ops/LinearOp.cpp
  • mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
  • mllm/compile/ir/tensor/Op.cpp
  • mllm/compile/ir/linalg/Op.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/core/aops/CmpOp.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/core/Tensor.hpp
  • mllm/nn/Functional.hpp
  • mllm/compile/ir/GeneratedRTTIKind.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.hpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/core/OpTypes.hpp
  • mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
  • mllm/backends/cpu/ops/CmpOp.hpp
  • mllm/core/aops/CmpOp.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.hpp
  • mllm/backends/qnn/aot/passes/AOTCompileContext.hpp
  • mllm/nn/Module.hpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/core/aops/WhereOp.hpp
  • mllm/core/DataTypes.cpp
  • mllm/core/aops/LinearOp.hpp
  • mllm/backends/cpu/ops/WhereOp.hpp
  • mllm/backends/cpu/ops/CmpOp.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/QnnTargetMachineParser.hpp
  • mllm/backends/cpu/ops/WhereOp.cpp
  • mllm/nn/Functional.cpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/core/aops/WhereOp.cpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
  • mllm/compile/ir/Node.hpp
  • mllm/compile/ir/Node.cpp
  • mllm/core/DataTypes.hpp
  • mllm/compile/ir/graph/Op.cpp
  • mllm/compile/ir/linalg/Op.hpp
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/core/Tensor.cpp
  • mllm/backends/cpu/ops/LinearOp.cpp
  • mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
  • mllm/compile/ir/tensor/Op.cpp
  • mllm/compile/ir/linalg/Op.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/core/aops/CmpOp.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/AOTCompileContext.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/core/DataTypes.cpp
  • mllm/backends/cpu/ops/CmpOp.cpp
  • mllm/backends/cpu/ops/WhereOp.cpp
  • mllm/nn/Functional.cpp
  • mllm/backends/qnn/aot/passes/OpNamingPass.cpp
  • mllm/core/aops/WhereOp.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/compile/ir/Node.cpp
  • mllm/compile/ir/graph/Op.cpp
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/core/Tensor.cpp
  • mllm/backends/cpu/ops/LinearOp.cpp
  • mllm/backends/qnn/aot/QnnTargetMachineParser.cpp
  • mllm/compile/ir/tensor/Op.cpp
  • mllm/compile/ir/linalg/Op.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/core/aops/CmpOp.cpp
{mllm,mllm-cli,pymllm}/**/*.{sh,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

If a file starts with a shebang, it must be the first line and contain only the interpreter path and optional single argument, encoded in UTF-8.

Files:

  • mllm/compile/ir/rtti_kind_gen.py
🧬 Code graph analysis (27)
mllm/backends/qnn/aot/passes/AOTPipeline.hpp (1)
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)
  • env (26-26)
  • env (26-26)
mllm/backends/qnn/aot/passes/AOTCompileContext.cpp (1)
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (1)
  • fp (28-28)
mllm/backends/cpu/ops/CmpOp.hpp (3)
mllm/backends/cpu/ops/WhereOp.hpp (3)
  • options (20-22)
  • options (20-20)
  • inputs (15-15)
mllm/core/aops/CmpOp.hpp (3)
  • inputs (21-21)
  • inputs (23-23)
  • inputs (25-25)
mllm/core/aops/WhereOp.hpp (3)
  • inputs (21-21)
  • inputs (23-23)
  • inputs (25-25)
mllm/core/aops/CmpOp.hpp (5)
mllm/core/BaseOp.hpp (5)
  • ploader (159-159)
  • trace_context (161-161)
  • inputs (163-163)
  • inputs (163-163)
  • inputs (165-165)
mllm/core/aops/WhereOp.hpp (6)
  • ploader (17-17)
  • trace_context (19-19)
  • inputs (21-21)
  • inputs (23-23)
  • inputs (25-25)
  • options_ (27-27)
mllm/core/aops/PadOp.hpp (3)
  • ploader (28-28)
  • trace_context (30-30)
  • options_ (38-38)
mllm/core/aops/MatMulOp.hpp (3)
  • ploader (54-54)
  • trace_context (56-56)
  • options_ (64-64)
mllm/core/aops/IndexOp.hpp (3)
  • ploader (20-20)
  • trace_context (22-22)
  • options_ (30-30)
mllm/backends/qnn/aot/visitor/Elewise.cpp (4)
mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/visitor/Base.hpp (3)
  • op (17-17)
  • op (17-17)
  • g_name (21-23)
mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
  • op (15-15)
  • g_name (17-18)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • op (17-17)
examples/qwen3_qnn_aot/compile.cpp (6)
mllm/utils/Argparse.hpp (1)
  • Argparse (135-135)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (6)
  • Qwen3ForCausalLM (297-309)
  • Qwen3ForCausalLM (297-297)
  • makeRoPEInvFreq (17-22)
  • makeRoPEInvFreq (17-17)
  • makeRotaryPosEmbedding (24-71)
  • makeRotaryPosEmbedding (24-25)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)
  • QnnAOTEnv (232-232)
  • QnnAOTEnv (234-234)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
  • QnnAOTEnv (250-250)
  • QnnAOTEnv (252-254)
mllm/backends/qnn/aot/QnnTargetMachineParser.cpp (2)
  • parseQcomTargetMachineFromJSONFile (88-96)
  • parseQcomTargetMachineFromJSONFile (88-88)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (2)
  • createQnnAOTLoweringPipeline (7-17)
  • createQnnAOTLoweringPipeline (7-7)
mllm/nn/Module.hpp (1)
mllm/utils/AnyValue.hpp (2)
  • value (90-95)
  • value (90-91)
mllm/core/aops/WhereOp.hpp (2)
mllm/backends/cpu/ops/WhereOp.hpp (3)
  • options (20-22)
  • options (20-20)
  • inputs (15-15)
mllm/core/aops/CmpOp.hpp (6)
  • ploader (17-17)
  • trace_context (19-19)
  • inputs (21-21)
  • inputs (23-23)
  • inputs (25-25)
  • options_ (27-27)
mllm/backends/cpu/ops/CmpOp.cpp (2)
mllm/core/aops/CmpOp.hpp (3)
  • inputs (21-21)
  • inputs (23-23)
  • inputs (25-25)
mllm/core/Tensor.cpp (2)
  • dtype (402-402)
  • dtype (402-402)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (2)
mllm/models/qwen_npu/modeling_qwen_npu_cpu.hpp (2)
  • self_attn_ (214-214)
  • embedding_ (277-277)
mllm/models/qwen_npu/modeling_qwen_npu.hpp (1)
  • block (442-444)
mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
mllm/backends/qnn/aot/visitor/Base.hpp (3)
  • op (17-17)
  • op (17-17)
  • g_name (21-23)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/QnnTargetMachineParser.hpp (1)
mllm/engine/ConfigFile.hpp (1)
  • json_str (39-39)
mllm/backends/cpu/ops/WhereOp.cpp (3)
mllm/backends/cpu/ops/WhereOp.hpp (4)
  • CPUWhereOp (13-13)
  • options (20-22)
  • options (20-20)
  • inputs (15-15)
mllm/core/aops/WhereOp.cpp (2)
  • forward (23-25)
  • forward (23-23)
mllm/core/aops/WhereOp.hpp (3)
  • inputs (21-21)
  • inputs (23-23)
  • inputs (25-25)
mllm/nn/Functional.cpp (3)
mllm/core/Tensor.hpp (4)
  • v (26-29)
  • v (26-26)
  • v (32-35)
  • v (32-32)
mllm/compile/ir/linalg/Op.hpp (1)
  • ctx (155-155)
mllm/core/Tensor.cpp (1)
  • instance (324-324)
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (1)
mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/visitor/Base.hpp (1)
mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
  • op (15-15)
  • g_name (17-18)
mllm/core/aops/WhereOp.cpp (2)
mllm/core/aops/WhereOp.hpp (6)
  • WhereOp (15-15)
  • ploader (17-17)
  • trace_context (19-19)
  • inputs (21-21)
  • inputs (23-23)
  • inputs (25-25)
mllm/backends/cpu/ops/WhereOp.hpp (3)
  • options (20-22)
  • options (20-20)
  • inputs (15-15)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (2)
mllm/backends/qnn/aot/visitor/Elewise.hpp (1)
  • op (15-15)
mllm/compile/ir/Node.hpp (2)
  • op (468-468)
  • op (470-470)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (5)
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)
  • env (26-26)
  • env (26-26)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (2)
  • createMarkQnnGraphPass (67-67)
  • createMarkQnnGraphPass (67-67)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • createMarkQnnGraphPass (20-20)
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)
  • createOpNamingPass (88-88)
  • createOpNamingPass (88-88)
mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
  • createOpNamingPass (20-20)
mllm/compile/ir/graph/Op.cpp (2)
mllm/compile/ir/Node.cpp (2)
  • dumpAttributes (17-30)
  • dumpAttributes (17-17)
mllm/compile/ir/Node.hpp (7)
  • p (56-56)
  • p (56-56)
  • p (58-58)
  • p (131-131)
  • p (163-163)
  • p (187-187)
  • p (198-198)
mllm/compile/ir/linalg/Op.hpp (4)
mllm/core/aops/CmpOp.cpp (1)
  • EqualOp (36-36)
mllm/core/aops/CmpOp.hpp (1)
  • EqualOp (15-15)
mllm/core/aops/WhereOp.cpp (1)
  • WhereOp (12-12)
mllm/core/aops/WhereOp.hpp (1)
  • WhereOp (15-15)
mllm/backends/qnn/aot/QnnTargetMachineParser.cpp (3)
pymllm/ffi/__init__.py (25)
  • NONE (341-342)
  • V68 (345-346)
  • V69 (349-350)
  • V73 (353-354)
  • V75 (357-358)
  • V79 (361-362)
  • V81 (365-366)
  • UNKNOWN_SM (374-375)
  • SA8295 (378-379)
  • SM8350 (382-383)
  • SM8450 (386-387)
  • SM8475 (390-391)
  • SM8550 (394-395)
  • SM8650 (398-399)
  • SM8750 (402-403)
  • SM8850 (406-407)
  • SSG2115P (410-411)
  • SSG2125P (414-415)
  • SXR1230P (418-419)
  • SXR2230P (422-423)
  • SXR2330P (426-427)
  • QCS9100 (430-431)
  • SAR2230P (434-435)
  • SA8255 (438-439)
  • SW6100 (442-443)
mllm/engine/ConfigFile.hpp (1)
  • json_str (39-39)
mllm/backends/qnn/aot/QnnTargetMachineParser.hpp (1)
  • parseQcomTargetMachineFromJSON (23-23)
mllm/compile/ir/linalg/Op.cpp (4)
mllm/core/aops/CmpOp.cpp (1)
  • EqualOp (36-36)
mllm/core/aops/CmpOp.hpp (1)
  • EqualOp (15-15)
mllm/core/aops/WhereOp.cpp (1)
  • WhereOp (12-12)
mllm/core/aops/WhereOp.hpp (1)
  • WhereOp (15-15)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (4)
mllm/compile/ir/linalg/Op.hpp (1)
  • ctx (155-155)
mllm/compile/ir/Node.cpp (3)
  • IRWriter (290-291)
  • name (202-202)
  • name (202-202)
mllm/compile/ir/Node.hpp (6)
  • IRWriter (354-354)
  • IRWriter (355-355)
  • op (468-468)
  • op (470-470)
  • name (302-302)
  • name (306-306)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (12)
  • QnnAOTParamScalar (51-51)
  • QnnAOTParamScalar (53-53)
  • QnnAOTParamScalar (55-55)
  • name (236-236)
  • name (238-238)
  • name (240-240)
  • v (102-102)
  • v (102-102)
  • param (133-133)
  • param (137-137)
  • QnnAOTEnv (232-232)
  • QnnAOTEnv (234-234)
mllm/core/aops/CmpOp.cpp (2)
mllm/core/aops/CmpOp.hpp (7)
  • EqualOp (15-15)
  • ploader (17-17)
  • trace_context (19-19)
  • inputs (21-21)
  • inputs (23-23)
  • inputs (25-25)
  • options_ (27-27)
mllm/backends/cpu/ops/CmpOp.hpp (3)
  • options (20-22)
  • options (20-20)
  • inputs (15-15)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)
mllm/backends/qnn/QNNUtils.hpp (2)
  • name (191-192)
  • name (195-196)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
  • QnnAOTParamTensor (91-112)
  • QnnAOTParamTensor (114-117)
🪛 Clang (14.0.6)
mllm/core/Tensor.hpp

[error] 420-420: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 426-426: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/core/aops/CmpOp.hpp

[error] 27-27: function 'options' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 30-30: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp

[error] 11-11: class 'OpNamingPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/passes/AOTCompileContext.hpp

[error] 11-11: constructor does not initialize these fields: config_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 11-11: class 'AOTCompileContext' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 28-28: parameter name 'fp' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 34-34: constructor does not initialize these fields: config_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

examples/qwen3_qnn_aot/compile.cpp

[error] 15-15: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 15-15: invalid case style for variable 'MLLM_MAIN'

(readability-identifier-naming,-warnings-as-errors)

mllm/core/aops/WhereOp.hpp

[error] 30-30: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

[error] 17-17: 2 adjacent parameters of 'makeRoPEInvFreq' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 24-24: 2 adjacent parameters of 'makeRotaryPosEmbedding' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 73-73: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 81-81: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 81-81: 2 adjacent parameters of 'Qwen3MLP' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 88-88: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 89-89: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 91-91: variable name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 98-98: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, rms_norm_q_, rms_norm_k_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, head_dim_, num_attention_heads_, num_key_value_heads_, num_key_value_groups_, scale_, layer_idx_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 120-120: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, rms_norm_q_, rms_norm_k_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, head_dim_, num_attention_heads_, num_key_value_heads_, layer_idx_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 120-120: 2 adjacent parameters of 'Qwen3Attention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 125-125: 'num_key_value_groups_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 126-126: 'scale_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 126-126: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 145-145: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 146-146: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 179-179: variable name 'kh' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 180-180: variable name 'vh' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 192-192: 20 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 195-195: variable name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 202-202: member variable 'layer_idx_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 205-205: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 214-214: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 214-214: 2 adjacent parameters of 'Qwen3Decoder' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 221-221: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 228-228: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 229-229: variable name '_' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 239-239: constructor does not initialize these fields: decode_blocks_, norm_, embedding_, rope_sin_, rope_cos_, num_hidden_layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 250-250: constructor does not initialize these fields: decode_blocks_, norm_, embedding_, rope_sin_, rope_cos_, num_hidden_layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 250-250: 2 adjacent parameters of 'Qwen3Text' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 260-260: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 264-264: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 297-297: constructor does not initialize these fields: lm_head_, tie_word_embeddings_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 311-311: 2 adjacent parameters of 'trace' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 313-313: variable 'llm_ir' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 350-350: variable 'position_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 389-389: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Elewise.hpp

[error] 15-15: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 17-17: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/nn/Functional.cpp

[error] 203-203: 3 adjacent parameters of 'where' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/passes/OpNamingPass.cpp

[error] 21-21: 2 adjacent parameters of 'visitSubGraph' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 49-49: 2 adjacent parameters of 'visitCallGraph' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Base.hpp

[error] 17-17: method 'isMatch' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 17-17: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 19-19: method 'rewrite' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 21-21: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp

[error] 223-223: function-like macro 'RTTI_RK_OP_LINALGIROP_EQUALOP_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 226-226: function-like macro 'RTTI_RK_OP_LINALGIROP_WHEREOP_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp

[error] 11-11: class 'MarkQnnGraphPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp

[error] 130-130: function-like macro 'MLLM_FP16_TO_FP32' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 133-133: function-like macro 'MLLM_FP32_TO_FP16' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/compile/ir/linalg/Op.hpp

[error] 247-247: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 247-247: class 'EqualOp' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 248-248: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 248-248: class 'WhereOp' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/QnnTargetMachineParser.cpp

[error] 75-75: variable name 'tm' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 92-92: variable 'buffer' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp

[error] 24-24: 2 adjacent parameters of 'recursiveVisitGraph' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

[error] 56-56: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/core/aops/CmpOp.cpp

[error] 10-10: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 28-28: declaration uses identifier '__mllmLoggerCallback4QnnLogger', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)


[error] 64-64: class 'QnnAOTParamTensor' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 68-68: 4 adjacent parameters of 'create' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 75-75: 4 adjacent parameters of 'create' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 104-104: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 119-119: constructor does not initialize these fields: op_name_, param_scalar, param_tensor, inputs, outputs

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 152-152: constructor does not initialize these fields: op_node_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-macos
  • GitHub Check: build-x86
  • GitHub Check: build-android

Comment thread examples/qwen3_qnn_aot/compile.cpp
Comment thread examples/qwen3_qnn_aot/compile.cpp
Comment thread mllm/backends/cpu/ops/CmpOp.cpp
Comment thread mllm/backends/cpu/ops/WhereOp.cpp
Comment thread mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
Comment thread mllm/core/aops/CmpOp.hpp
Comment thread mllm/core/aops/WhereOp.cpp
Comment thread mllm/core/Tensor.cpp
Comment thread mllm/core/Tensor.cpp
Comment thread mllm/nn/Functional.cpp
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

114-118: Previous issue not fully resolved: destructor still crashes if alloc() never called.

The assertion on line 116 will still fail if data is nullptr (when alloc() was never called), which was the issue flagged in the previous review. The if (data) check on line 117 is redundant because it comes after the assertion.

🔎 Proposed fix
 QnnAOTParamTensor::~QnnAOTParamTensor() {
   auto data = QNN_TENSOR_GET_CLIENT_BUF(qnn_param_.tensorParam).data;
-  MLLM_RT_ASSERT(data != nullptr);
-  if (data) { free(data); }
+  if (data != nullptr) {
+    free(data);
+  }
 }

Based on past review comments, the destructor should gracefully handle unallocated tensors.


140-154: Previous issue not resolved: TODO stubs still return without values for non-void functions.

These methods have non-void return types but contain no implementation, which causes undefined behavior if they are called. The issue flagged in the previous review remains unaddressed.

🔎 Proposed fix - add explicit error handling
 Qnn_TensorType_t QnnAOTNodeTensor::parseQnnTensorTypeFromIR() {
-  // TODO
+  // TODO: Implement tensor type parsing from IR
+  MLLM_ERROR("parseQnnTensorTypeFromIR not implemented");
+  MLLM_RT_ASSERT(false && "parseQnnTensorTypeFromIR not implemented");
+  return QNN_TENSOR_TYPE_UNDEFINED;
 }

 Qnn_DataType_t QnnAOTNodeTensor::parseQnnDataTypeFromIR() {
-  // TODO
+  // TODO: Implement data type parsing from IR
+  MLLM_ERROR("parseQnnDataTypeFromIR not implemented");
+  MLLM_RT_ASSERT(false && "parseQnnDataTypeFromIR not implemented");
+  return QNN_DATATYPE_UNDEFINED;
 }

 std::string QnnAOTNodeTensor::parseQnnTensorNameFromIR() {
-  // TODO
+  // TODO: Implement tensor name parsing from IR
+  MLLM_ERROR("parseQnnTensorNameFromIR not implemented");
+  MLLM_RT_ASSERT(false && "parseQnnTensorNameFromIR not implemented");
+  return "";
 }

 Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR() {
-  // TODO
+  // TODO: Implement quantize param parsing from IR
+  MLLM_ERROR("parseQnnQuantizeParamFromIR not implemented");
+  MLLM_RT_ASSERT(false && "parseQnnQuantizeParamFromIR not implemented");
+  return Qnn_QuantizeParams_t{};
 }

Based on past review comments, these stubs should have explicit error handling to avoid undefined behavior.

🧹 Nitpick comments (2)
mllm/compile/ir/linalg/Op.cpp (1)

119-119: Verify no trailing whitespace on empty line.

Empty lines should not contain trailing whitespace per the coding guidelines.

Run the following script to check for trailing whitespace:

#!/bin/bash
# Check for trailing whitespace on line 119
sed -n '119p' mllm/compile/ir/linalg/Op.cpp | grep -E '\s+$' && echo "Trailing whitespace found on line 119" || echo "No trailing whitespace on line 119"

As per coding guidelines, rule 4 states: "No line may end with trailing whitespace."

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (1)

43-43: Consider passing strings by const reference.

The range-based for loops copy each string by value. For better efficiency, especially if the configuration lists are large, consider using const auto& or const std::string&.

🔎 Suggested optimization

Line 43:

-  for (std::string item : config["graph_on_qnn"]) {
+  for (const auto& item : config["graph_on_qnn"]) {

Line 50:

-  for (std::string item : config["op_on_qnn"]) { op_visited.insert({item, false}); }
+  for (const auto& item : config["op_on_qnn"]) { op_visited.insert({item, false}); }

Also applies to: 50-50

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3dbe563 and 47b3bd2.

📒 Files selected for processing (4)
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/compile/ir/linalg/Op.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/compile/ir/linalg/Op.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/compile/ir/linalg/Op.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/compile/ir/linalg/Op.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp
  • mllm/compile/ir/linalg/Op.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
🧬 Code graph analysis (4)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (4)
mllm/compile/ir/linalg/Op.hpp (1)
  • ctx (155-155)
mllm/compile/ir/Node.hpp (6)
  • IRWriter (354-354)
  • IRWriter (355-355)
  • op (468-468)
  • op (470-470)
  • name (302-302)
  • name (306-306)
mllm/compile/ir/Node.cpp (3)
  • IRWriter (290-291)
  • name (202-202)
  • name (202-202)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • op (17-17)
mllm/compile/ir/linalg/Op.cpp (4)
mllm/core/aops/CmpOp.cpp (1)
  • EqualOp (36-36)
mllm/core/aops/CmpOp.hpp (1)
  • EqualOp (15-15)
mllm/core/aops/WhereOp.hpp (1)
  • WhereOp (15-15)
mllm/core/aops/WhereOp.cpp (1)
  • WhereOp (12-12)
mllm/backends/qnn/aot/visitor/Elewise.cpp (4)
mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/visitor/Base.hpp (3)
  • op (17-17)
  • op (17-17)
  • g_name (21-23)
mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
  • op (15-15)
  • g_name (17-18)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (28)
  • QnnAOTParamScalar (51-51)
  • QnnAOTParamScalar (53-53)
  • QnnAOTParamScalar (55-55)
  • name (236-236)
  • name (238-238)
  • name (240-240)
  • QnnAOTParamTensor (80-81)
  • QnnAOTParamTensor (83-83)
  • param_name (68-73)
  • param_name (68-69)
  • param_name (75-78)
  • param_name (75-76)
  • QnnAOTDataTypeSize (30-30)
  • QnnAOTNodeTensor (104-104)
  • v (102-102)
  • v (102-102)
  • ins (123-123)
  • ous (125-125)
  • input (127-127)
  • output (129-129)
  • params (131-131)
  • params (135-135)
  • param (133-133)
  • param (137-137)
  • op_name (139-139)
  • package_name (141-141)
  • QnnAOTEnv (232-232)
  • QnnAOTEnv (234-234)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp

[error] 24-24: 2 adjacent parameters of 'recursiveVisitGraph' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

[error] 56-56: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-x86
  • GitHub Check: build-macos
  • GitHub Check: build-android
🔇 Additional comments (11)
mllm/compile/ir/linalg/Op.cpp (1)

117-118: LGTM! Operator registrations are correct.

The mappings correctly associate OpTypes::kEqual with EqualOp and OpTypes::kWhere with WhereOp, matching their respective constructor implementations. The previous issue where WhereOp was incorrectly mapped to OpTypes::kArgsort has been resolved.

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (2)

24-24: Static analysis warning is a false positive.

The Clang warning about "easily swappable parameters" is incorrect. The two parameters have distinct types: const ir::IRContext::ptr_t& (shared pointer to IRContext) and const ir::graph::GraphIROp::ptr_t& (shared pointer to GraphIROp), so they cannot be accidentally swapped.


58-60: Past issue has been correctly addressed.

The op_visited map is now properly updated at line 60 with op_visited[name] = true;, ensuring that each matching op name is marked only on its first occurrence. This resolves the concern raised in the previous review.

mllm/backends/qnn/aot/visitor/Elewise.cpp (3)

10-12: LGTM!

The isMatch implementation correctly identifies AddOp nodes with the "using_qnn" attribute for QNN AOT pattern matching.


17-21: LGTM!

The cast to linalg::AddOp includes appropriate error handling and logging.


23-27: LGTM!

The error message has been corrected to accurately reference aops::AddOp instead of the previous incorrect message. The cast logic and error handling are appropriate.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)

65-89: LGTM!

The scalar parameter constructors correctly initialize the QNN parameter structure for each type (bool, uint32_t, float), and the name lifetime is properly managed via the member variable.


91-112: LGTM!

The tensor parameter constructor properly initializes all QNN tensor fields and correctly manages string/vector member lifetimes.


157-205: LGTM!

The builder pattern methods correctly implement the fluent interface, allowing chainable operation configuration. The distinction between add* (for vectors) and emplace* (for single items) is clear and consistent.


251-255: LGTM!

The constructor signature update to accept const QcomTargetMachine& is appropriate and correctly applied to both overloads.


378-390: LGTM!

The conditional allocation of the context config array with null-termination is correctly implemented. The size check prevents unnecessary allocation, and memory is properly tracked via unreachable_handle_.

Comment on lines +58 to +62
default:
MLLM_ERROR("QnnAOTDataTypeSize: unsupported Qnn_DataType_t {}", static_cast<int>(dtype));
MLLM_RT_ASSERT(false);
return 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against assertion-disabled builds in error path.

If assertions are compiled out, the function returns 0 for unsupported types, which would cause incorrect memory calculations (potentially leading to zero-sized allocations or buffer underflows). Consider using a fatal error mechanism that cannot be disabled, or throw an exception.

🔎 Proposed fix
   default:
     MLLM_ERROR("QnnAOTDataTypeSize: unsupported Qnn_DataType_t {}", static_cast<int>(dtype));
-    MLLM_RT_ASSERT(false);
-    return 0;
+    std::abort(); // or throw std::runtime_error("Unsupported Qnn_DataType_t");
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 58-62, the
default/error path currently relies on MLLM_RT_ASSERT(false) followed by
returning 0 which can silently return 0 when assertions are disabled; replace
that with an unconditional fatal behavior that cannot be compiled out — e.g.,
throw a std::runtime_error with a descriptive message that includes the
unsupported dtype (static_cast<int>(dtype)), or if exceptions are not allowed in
this module call a guaranteed-terminate function (abort/std::terminate or your
project’s MLLM_FATAL equivalent) with the same descriptive message; ensure the
function does not return a value after that call so callers cannot receive 0 for
an unsupported type.

Comment on lines +120 to +129
void* QnnAOTParamTensor::alloc() {
uint32_t data_size = QnnAOTDataTypeSize(QNN_TENSOR_GET_DATA_TYPE(qnn_param_.tensorParam));
for (int i = 0; i < QNN_TENSOR_GET_RANK(qnn_param_.tensorParam); i++) {
data_size *= qnn_param_.tensorParam.v2.dimensions[i];
}
Qnn_ClientBuffer_t clientBuffer = {.data = malloc(data_size), .dataSize = data_size};
QNN_TENSOR_SET_CLIENT_BUF(qnn_param_.tensorParam, clientBuffer);
MLLM_RT_ASSERT(QNN_TENSOR_GET_CLIENT_BUF(qnn_param_.tensorParam).data != nullptr);
return QNN_TENSOR_GET_CLIENT_BUF(qnn_param_.tensorParam).data;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add overflow protection for size calculation.

The data_size calculation multiplies dimensions without overflow checking. With sufficiently large dimension values, this could overflow and result in a small allocation, leading to buffer overflows when the tensor is populated.

🔎 Proposed fix using checked multiplication
 void* QnnAOTParamTensor::alloc() {
-  uint32_t data_size = QnnAOTDataTypeSize(QNN_TENSOR_GET_DATA_TYPE(qnn_param_.tensorParam));
+  size_t data_size = QnnAOTDataTypeSize(QNN_TENSOR_GET_DATA_TYPE(qnn_param_.tensorParam));
   for (int i = 0; i < QNN_TENSOR_GET_RANK(qnn_param_.tensorParam); i++) {
-    data_size *= qnn_param_.tensorParam.v2.dimensions[i];
+    size_t dimension = qnn_param_.tensorParam.v2.dimensions[i];
+    if (data_size > 0 && dimension > SIZE_MAX / data_size) {
+      MLLM_ERROR("QnnAOTParamTensor::alloc: size overflow detected");
+      MLLM_RT_ASSERT(false);
+      return nullptr;
+    }
+    data_size *= dimension;
   }
   Qnn_ClientBuffer_t clientBuffer = {.data = malloc(data_size), .dataSize = data_size};
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 120 to 129, the
data_size calculation multiplies element size by each dimension without overflow
checks; change data_size to size_t, compute iteratively and before each
multiplication verify that data_size <= SIZE_MAX / dimension (and that dimension
> 0), and if the check fails handle it by logging/erroring/returning nullptr or
asserting instead of proceeding to malloc; also verify QnnAOTDataTypeSize result
fits into size_t before starting and ensure malloc is checked for null before
using the buffer.

Comment on lines +29 to +31
// TODO

return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

TODO comment format and incomplete implementation.

The TODO comment on line 29 doesn't follow the coding guidelines format. Per the guidelines, TODO comments must be written as TODO: followed by a description.

Additionally, the function currently performs validation but doesn't implement the actual node addition logic. The parameters g_name, inputs, and outputs remain unused, and the function returns true despite not adding any QNN node to the graph.

🔎 Suggested TODO format
-  // TODO
+  // TODO: Implement QNN graph node addition for AddOp

Would you like me to help implement the QNN node addition logic for this pattern, or open a new issue to track this task?

Based on coding guidelines, TODO comments must follow the format 'TODO:' followed by UTF-8 text.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// TODO
return true;
// TODO: Implement QNN graph node addition for AddOp
return true;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants