feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline #569
Conversation
- Introduced MarkTensorIO pass to tag tensor inputs and outputs in the CallGraphOp. - Updated AOTPipeline to include the new MarkTensorIO pass. - Created MarkTensorIO implementation and header files. - Modified OpNamingPass to ensure proper namespace usage. - Added base pattern class for quantization recipes and implemented matching logic for Add operations. - Enhanced LinalgIRQuantizationAnnotationAttr to include a dump method for better debugging. - Introduced UUID management for quantization specifications to ensure unique identification. - Cleaned up TensorValue dump method by removing unnecessary constant attribute printing.
WalkthroughAdds a MarkTensorIOPass to the QNN AOT lowering pipeline, introduces a quant-recipe pattern framework and an Add quant-recipe pattern, adds UUID tracking and textual dump for quantization specs, adjusts example KV-cache quantization for value_states, and renames a few namespaces for AOT consistency. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor CLI as Compiler frontend
participant AOT as AOTPipeline
participant Pass as MarkTensorIOPass
participant IR as IR (Module/CallGraphOp)
CLI->>AOT: invoke lowering
AOT->>Pass: run createMarkTensorIOPass()
Pass->>IR: locate ModuleOp → CallGraphOp
alt llm_recipe enabled
Pass->>IR: mark inputs as qnn_graph_inputs
Pass->>IR: mark outputs as qnn_graph_outputs
else llm_recipe disabled
Pass-->>AOT: emit config error
end
Pass-->>AOT: pass completed
AOT-->>CLI: continue lowering
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
mllm/compile/ir/linalg/Attribute.hpp (2)
243-247: Missing UUID assignment inQuantizationSpecAsymPerBlock::create().All other
QuantizationSpecsubclasscreate()methods assign a UUID, but this one does not. This inconsistency will result in uninitialized or zero UUIDs, breaking uniqueness guarantees.🔎 Proposed fix
static inline ptr_t create() { auto spec = std::make_shared<QuantizationSpecAsymPerBlock>(); spec->type = QuantizationSpecType::kAsymPerBlock; + spec->uuid = QuantizationSpecUUIDGiver::getInstance().getUUID(); return spec; }
278-282: Missing UUID assignment inQuantizationSpecLPBQ::create().Similar to
QuantizationSpecAsymPerBlock, thiscreate()method does not assign a UUID, creating an inconsistency with other subclasses.🔎 Proposed fix
static inline ptr_t create() { auto spec = std::make_shared<QuantizationSpecLPBQ>(); spec->type = QuantizationSpecType::kLPBQ; + spec->uuid = QuantizationSpecUUIDGiver::getInstance().getUUID(); return spec; }
🧹 Nitpick comments (4)
mllm/compile/ir/linalg/Attribute.hpp (1)
58-62: Consider initializing struct fields for safety.The
typeanduuidfields are uninitialized in a default-constructedQuantizationSpec. While subclasses use factory methods that set these fields, adding default initialization improves safety and eliminates the static analysis warning.🔎 Proposed initialization
struct QuantizationSpec { using ptr_t = std::shared_ptr<QuantizationSpec>; - QuantizationSpecType type; - uint64_t uuid; + QuantizationSpecType type = QuantizationSpecType::kNone; + uint64_t uuid = 0; };mllm/compile/ir/linalg/Attribute.cpp (1)
44-64: Improve comma logic to handle empty collections.The current implementation adds unconditional commas at lines 50 and 57, which can produce malformed output (leading commas or double commas) when
inputs,outputs, orweightsare empty. Consider tracking whether any content was printed and adding commas conditionally.🔎 Example approach with conditional comma logic
+ bool first = true; for (int i = 0; i < annotation_.inputs.size(); ++i) { + if (!first) { p.comma(); } p.print("inputs_" + std::to_string(i)); p.colon(); p.print("{}", gen_quant_spec_str(annotation_.inputs[i])); - if (i < annotation_.inputs.size() - 1) { p.comma(); } + first = false; } - p.comma(); for (int i = 0; i < annotation_.outputs.size(); ++i) { + if (!first) { p.comma(); } p.print("outputs_" + std::to_string(i)); p.colon(); p.print("{}", gen_quant_spec_str(annotation_.outputs[i])); - if (i < annotation_.outputs.size() - 1) { p.comma(); } + first = false; } - p.comma(); // Similar pattern for weights...mllm/backends/qnn/aot/visitor/Elewise.cpp (1)
38-41: TODO: Implement the rewrite logic.The
rewritemethod is a placeholder that returnstruewithout performing any transformation. If this pattern is invoked during the AOT lowering pipeline, it will incorrectly signal success despite doing nothing.Do you want me to help generate the implementation skeleton for this method, or would you prefer to open an issue to track this task?
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (1)
38-65: Consider extracting the configuration example.The 28-line inline JSON configuration example reduces readability. Consider moving it to:
- A dedicated configuration documentation file
- The file header comment
- Referenced external documentation
This would improve code maintainability while preserving the valuable example.
🔎 Proposed refactor
- // Visit all graphs and assign names to unnamed operations - // { - // "target_machine": { - // "htp_arch": "V81", - // "htp_chipset": "SM8850", - // "htp_try_best_performance": "HtpBurst", - // "htp_security_pd_session": "HtpSignedPd", - // "htp_vtcm_capability_in_mb": 8 - // }, - // "graph_on_qnn": [ - // "model" - // ], - // "op_on_qnn": [ - // "lm_head" - // ], - // "quant_recipe": { - // "llm_recipe": true, - // "builtin_qwen3_recipe": { - // "linear": "w4a16-lpbq", - // "kv_cache": { - // "key": "int8-per-tensor", - // "value": "int8-per-tensor" - // } - // } - // } - // } - // - // "llm_recipe": true must be setted! + // Validate llm_recipe configuration + // See examples/qwen3_qnn_aot/qnn_aot_cfg.json for configuration reference + // NOTE: "llm_recipe": true must be set!
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hppexamples/qwen3_qnn_aot/qnn_aot_cfg.jsonexamples/qwen3_qnn_aot/qwen3_qnn_aot.mirmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/backends/qnn/aot/passes/MarkTensorIO.hppmllm/backends/qnn/aot/passes/OpNamingPass.cppmllm/backends/qnn/aot/passes/OpNamingPass.hppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/compile/ir/linalg/Attribute.cppmllm/compile/ir/linalg/Attribute.hppmllm/compile/ir/tensor/Value.cpp
💤 Files with no reviewable changes (1)
- mllm/compile/ir/tensor/Value.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/passes/OpNamingPass.hppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/passes/OpNamingPass.cppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/MarkTensorIO.hppmllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/visitor/Base.hppmllm/compile/ir/linalg/Attribute.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/passes/OpNamingPass.hppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/passes/OpNamingPass.cppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/MarkTensorIO.hppmllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/visitor/Base.hppmllm/compile/ir/linalg/Attribute.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/passes/OpNamingPass.hppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/passes/OpNamingPass.cppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/MarkTensorIO.hppmllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/visitor/Base.hppmllm/compile/ir/linalg/Attribute.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/passes/OpNamingPass.cppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/backends/qnn/aot/passes/AOTPipeline.cpp
🧬 Code graph analysis (4)
mllm/compile/ir/linalg/Attribute.cpp (2)
mllm/compile/ir/tensor/Value.cpp (2)
dump(39-75)dump(39-39)mllm/compile/ir/linalg/Attribute.hpp (1)
p(303-303)
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
MarkQnnGraphPass(9-22)
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (2)
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)
run(65-86)run(65-65)mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
op(17-17)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (2)
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (2)
createMarkTensorIOPass(114-114)createMarkTensorIOPass(114-114)mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
createMarkTensorIOPass(20-20)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/visitor/Elewise.hpp
[error] 27-27: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp
[error] 11-11: class 'MarkTensorIOPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
mllm/backends/qnn/aot/visitor/Base.hpp
[error] 30-30: method 'isMatch' can be made static
(readability-convert-member-functions-to-static,-warnings-as-errors)
[error] 30-30: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 32-32: method 'rewrite' can be made static
(readability-convert-member-functions-to-static,-warnings-as-errors)
mllm/compile/ir/linalg/Attribute.hpp
[error] 41-41: class 'QuantizationSpecUUIDGiver' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
[error] 58-58: constructor does not initialize these fields: type, uuid
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 303-303: parameter name 'p' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-x86
- GitHub Check: build-macos
- GitHub Check: build-android
🔇 Additional comments (17)
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (1)
15-15: LGTM! Namespace change is consistent with the header.The namespace relocation from
mllm::qnntomllm::qnn::aotis correctly applied in the implementation file, with the closing comment properly updated. The functional logic remains unchanged, and the change is consistent with the corresponding header file.Also applies to: 90-90
mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
9-9: Namespace relocation is correct and all references are properly updated.The namespace change from
mllm::qnntomllm::qnn::aotis syntactically correct, the closing comment is properly updated, and all usages ofOpNamingPassare already referencing the new namespace. No orphaned references to the old namespace exist.mllm/compile/ir/linalg/Attribute.cpp (2)
3-3: LGTM!The
<sstream>include is necessary for thestd::stringstreamused in thedump()implementation.
23-40: LGTM!The lambda correctly generates a string representation of
QuantizationSpecwith type name and UUID.examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)
179-179: LGTM! Symmetric quantization for value_states.The quantization of
value_statestokInt8PerTensorSymcorrectly mirrors thekey_statesquantization pattern and aligns with the KV cache quantization configuration ("value": "int8-per-tensor").mllm/backends/qnn/aot/visitor/Base.hpp (1)
26-33: LGTM! Base pattern class follows established design.The
QnnAOTQuantRecipeBasePatterncorrectly provides default no-op implementations forisMatchandrewrite, following the same structure asQnnAOTBasePattern. The static analysis hints suggesting to make methods static or rename the parameter are false positives—methods must remain virtual for polymorphic pattern matching.mllm/backends/qnn/aot/visitor/Elewise.hpp (1)
25-34: LGTM! Pattern class declaration follows established conventions.The
QnnAOTAddQuantRecipePatternclass correctly derives fromQnnAOTQuantRecipeBasePatternand provides a factory method for pattern registration. The structure is consistent with the existingQnnAOTAddPatterndesign.mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)
4-4: LGTM! Clean integration of MarkTensorIO pass.The
MarkTensorIOPassis appropriately added to the pipeline after graph marking and operation naming passes, ensuring that prerequisite transformations are complete before IO tagging.Also applies to: 17-17
examples/qwen3_qnn_aot/qnn_aot_cfg.json (1)
16-16: LGTM! Configuration enables MarkTensorIO pass.The
llm_recipeflag is correctly added and will satisfy the validation requirement inMarkTensorIOPass::run(MarkTensorIO.cpp lines 68-73).mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
11-18: LGTM! Pass declaration follows established pattern.The
MarkTensorIOPassclass structure correctly mirrorsMarkQnnGraphPass(from relevant code snippets). The static analysis hint about special member functions is a false positive—pass classes are managed viashared_ptrand should not be copied or moved directly.mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (6)
18-26: LGTM! Standard pass initialization.The function correctly retrieves the AOT configuration, validates the top-level operation type, and creates an IRWriter for graph traversal, following the established pattern from other passes.
28-36: LGTM! CallGraphOp discovery follows established pattern.The walker correctly locates the main
CallGraphOpand asserts uniqueness, matching the pattern used inOpNamingPass::run(from relevant snippets).
66-73: LGTM! Comprehensive configuration validation.The validation logic thoroughly checks for the required
llm_recipeflag with clear error messaging. The multi-stage validation (exists, is boolean, is true) handles edge cases gracefully.
75-86: LGTM! Input tagging logic correctly enforces graph boundaries.The code properly identifies and tags graph inputs, with the
prevOp()check ensuring that only true graph boundary inputs (not internal derived values) are tagged.
114-114: LGTM! Factory function follows standard pattern.The factory function correctly instantiates and returns the pass as a
shared_ptr.
88-108: Verify LinearOp identity validation in the lm_head special case.The output tagging includes special handling for the
lm_headscenario (lines 93-100), where a graph output feeds both aReturnOpand aLinearOp. The code correctly assumes the output order based on IR construction (inModule.cpp, theLinearOpoutputs are connected beforeReturnOpis created), which the comments confirm.However, the code lacks validation that the
LinearOpis specifically the "lm_head" operation. Currently, it tags anyLinearOpoutput asqnn_graph_outputsif the pattern matches, without verifying the operation's name or symbol. Consider:
- Adding a check for the LinearOp's name/symbol to ensure it is indeed "lm_head"
- Documenting why matching any LinearOp is sufficient, or
- Refining the pattern to be more specific if needed
The fallback case (line 102) handles other graph topologies appropriately.
mllm/backends/qnn/aot/visitor/Elewise.cpp (1)
34-36: The identicalisMatchimplementations appear intentional but lack discriminating logic if both patterns are used together.Both
QnnAOTAddPattern::isMatch(lines 10-12) andQnnAOTAddQuantRecipePattern::isMatch(lines 34-36) are identical, matchingAddOpoperations with the"using_qnn"attribute. While the patterns inherit from separate base classes (QnnAOTBasePatternvsQnnAOTQuantRecipeBasePattern) with different virtual methods—addNode()vsrewrite()—the identical matching logic means both would match the same operations if registered in the same pattern pass. Currently, these patterns are not instantiated or used in any pass; however, when they are integrated, clarify whether:
- Only one pattern should match these operations (add discriminating logic)
- Both patterns intentionally process the same operations in sequence (document the execution order)
- Different pattern passes will use these separately (document which pass uses which pattern)
| for (int i = 0; i < annotation_.outputs.size(); ++i) { | ||
| p.print("outputs_" + std::to_string(i)); | ||
| p.colon(); | ||
| p.print("{}", gen_quant_spec_str(annotation_.inputs[i])); | ||
| if (i < annotation_.inputs.size() - 1) { p.comma(); } | ||
| } |
There was a problem hiding this comment.
Critical bug: outputs section prints inputs data.
Lines 54-55 reference annotation_.inputs[i] and annotation_.inputs.size() instead of the corresponding outputs members. This causes the outputs section to print input data incorrectly.
🔎 Proposed fix
for (int i = 0; i < annotation_.outputs.size(); ++i) {
p.print("outputs_" + std::to_string(i));
p.colon();
- p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
- if (i < annotation_.inputs.size() - 1) { p.comma(); }
+ p.print("{}", gen_quant_spec_str(annotation_.outputs[i]));
+ if (i < annotation_.outputs.size() - 1) { p.comma(); }
}🤖 Prompt for AI Agents
In mllm/compile/ir/linalg/Attribute.cpp around lines 51 to 56, the outputs loop
incorrectly references annotation_.inputs when generating output strings and
when checking the loop-end; change the gen_quant_spec_str call to use
annotation_.outputs[i] and update the size check to annotation_.outputs.size()
(and any related index/size usages in that block) so the outputs section prints
the correct output data and comma logic is based on outputs.
| for (int i = 0; i < annotation_.weights.size(); ++i) { | ||
| p.print("weight_" + std::to_string(i)); | ||
| p.colon(); | ||
| p.print("{}", gen_quant_spec_str(annotation_.inputs[i])); | ||
| if (i < annotation_.inputs.size() - 1) { p.comma(); } | ||
| } |
There was a problem hiding this comment.
Critical bug: incorrect iteration over unordered_map.
The annotation_.weights member is an unordered_map<string, QuantizationSpec::ptr_t>, not a vector. You cannot iterate it with integer indices. Additionally, line 61 incorrectly references annotation_.inputs[i] instead of the weight value, and line 62 checks the wrong size.
🔎 Proposed fix using range-based for loop
- for (int i = 0; i < annotation_.weights.size(); ++i) {
- p.print("weight_" + std::to_string(i));
+ int weight_idx = 0;
+ for (const auto& [name, spec] : annotation_.weights) {
+ p.print("weight_" + name);
p.colon();
- p.print("{}", gen_quant_spec_str(annotation_.inputs[i]));
- if (i < annotation_.inputs.size() - 1) { p.comma(); }
+ p.print("{}", gen_quant_spec_str(spec));
+ if (weight_idx < annotation_.weights.size() - 1) { p.comma(); }
+ weight_idx++;
}🤖 Prompt for AI Agents
In mllm/compile/ir/linalg/Attribute.cpp around lines 58 to 63, the code
incorrectly treats annotation_.weights (an unordered_map<string,
QuantizationSpec::ptr_t>) like a vector: it indexes by integer, references
annotation_.inputs[i], and compares to annotation_.inputs.size(); change to
iterate the unordered_map with a range-based loop (for (const auto& kv :
annotation_.weights) or using iterator), use kv.first for the weight name and
kv.second as the QuantizationSpec passed to gen_quant_spec_str, and determine
comma placement by either tracking an index/counter versus
annotation_.weights.size() or by using an iterator and checking std::next(it) !=
end to only append commas between elements.
| class QuantizationSpecUUIDGiver { | ||
| public: | ||
| static QuantizationSpecUUIDGiver& getInstance() { | ||
| static QuantizationSpecUUIDGiver instance; | ||
| return instance; | ||
| } | ||
|
|
||
| uint64_t getUUID() { return next_uuid_++; } | ||
|
|
||
| private: | ||
| QuantizationSpecUUIDGiver() = default; | ||
| QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT | ||
| QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT | ||
|
|
||
| uint64_t next_uuid_ = 0; | ||
| }; |
There was a problem hiding this comment.
Thread-safety issue: getUUID() lacks synchronization.
The next_uuid_++ operation in getUUID() is not atomic and lacks synchronization. If multiple threads call this method concurrently, it can result in race conditions, duplicate UUIDs, or lost increments.
🔎 Proposed fix using std::atomic
+#include <atomic>
+
class QuantizationSpecUUIDGiver {
public:
static QuantizationSpecUUIDGiver& getInstance() {
static QuantizationSpecUUIDGiver instance;
return instance;
}
uint64_t getUUID() { return next_uuid_++; }
private:
QuantizationSpecUUIDGiver() = default;
QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT
QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT
- uint64_t next_uuid_ = 0;
+ std::atomic<uint64_t> next_uuid_{0};
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class QuantizationSpecUUIDGiver { | |
| public: | |
| static QuantizationSpecUUIDGiver& getInstance() { | |
| static QuantizationSpecUUIDGiver instance; | |
| return instance; | |
| } | |
| uint64_t getUUID() { return next_uuid_++; } | |
| private: | |
| QuantizationSpecUUIDGiver() = default; | |
| QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT | |
| QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT | |
| uint64_t next_uuid_ = 0; | |
| }; | |
| #include <atomic> | |
| class QuantizationSpecUUIDGiver { | |
| public: | |
| static QuantizationSpecUUIDGiver& getInstance() { | |
| static QuantizationSpecUUIDGiver instance; | |
| return instance; | |
| } | |
| uint64_t getUUID() { return next_uuid_++; } | |
| private: | |
| QuantizationSpecUUIDGiver() = default; | |
| QuantizationSpecUUIDGiver(const QuantizationSpecUUIDGiver&) = delete; // NOLINT | |
| QuantizationSpecUUIDGiver& operator=(const QuantizationSpecUUIDGiver&) = delete; // NOLINT | |
| std::atomic<uint64_t> next_uuid_{0}; | |
| }; |
🧰 Tools
🪛 Clang (14.0.6)
[error] 41-41: class 'QuantizationSpecUUIDGiver' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/compile/ir/linalg/Attribute.hpp around lines 41-56, the getUUID() method
increments next_uuid_ without synchronization leading to race conditions; change
next_uuid_ to std::atomic<uint64_t> and implement getUUID() to return
next_uuid_.fetch_add(1, std::memory_order_relaxed) (or stronger ordering if
required), and add the required #include <atomic> at the top of the file so
concurrent calls are atomic and thread-safe.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (1)
45-71: Consider moving the large JSON example to documentation.While the inline JSON example is helpful for understanding the expected config format, it's quite verbose (27 lines). Consider moving it to the header file documentation, a separate markdown file, or condensing it to show only the relevant parts for this pass.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/qwen3_qnn_aot/compile.cppexamples/qwen3_qnn_aot/qnn_aot_cfg.jsonmllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/core/Tensor.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/qwen3_qnn_aot/qnn_aot_cfg.json
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/core/Tensor.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/core/Tensor.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/core/Tensor.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/passes/MarkTensorIO.cppmllm/core/Tensor.cpp
🧬 Code graph analysis (3)
examples/qwen3_qnn_aot/compile.cpp (1)
mllm/core/Tensor.cpp (2)
zeros(97-101)zeros(97-97)
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (3)
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)
run(65-86)run(65-65)mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
op(17-17)mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
op(17-17)
mllm/core/Tensor.cpp (2)
mllm/core/DataTypes.cpp (2)
nameOfType(107-154)nameOfType(107-107)mllm/core/DataTypes.hpp (1)
nameOfType(713-713)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-android
- GitHub Check: build-x86
- GitHub Check: build-macos
🔇 Additional comments (6)
mllm/core/Tensor.cpp (1)
370-370: LGTM! Improved error message.Including the actual type name in the error message using
nameOfType(dtype())significantly improves debugging experience when unsupported types are encountered.mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (5)
1-15: LGTM!The includes are well-organized and appropriate for this pass implementation.
22-26: LGTM!The validation ensures the pass operates under supported conditions, and the error message clearly explains the constraint.
72-79: LGTM!The llm_recipe validation is thorough with clear, actionable error messages.
81-92: LGTM!The input tagging logic correctly validates tensor inputs and applies the appropriate attribute.
117-120: LGTM!The return statement and factory function follow standard patterns for pass implementation.
| auto sequence = mllm::Tensor::zeros({1, N}, mllm::kInt64); | ||
| auto causal_mask = mllm::Tensor::zeros({1, 1, N, CL}, mllm::kUInt16); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's verify the exact content of lines 59-60
cat -n examples/qwen3_qnn_aot/compile.cpp | sed -n '55,70p'Repository: UbiquitousLearning/mllm
Length of output: 837
🏁 Script executed:
# Search for causal_mask usage patterns across the codebase
rg -n 'causal_mask.*Tensor' --type=cpp -C2Repository: UbiquitousLearning/mllm
Length of output: 16711
🏁 Script executed:
# Search for mask dtype specifications in QNN backend and AOT code
rg -n 'Tensor::(zeros|empty|ones).*k(UInt|Int|Float)' --type=cpp -g 'backends/qnn/**' -C1Repository: UbiquitousLearning/mllm
Length of output: 215
🏁 Script executed:
# Look for attention mask or causal mask related code in the entire repo
rg -n 'causal.*mask' --type=cpp -g '**/*.cpp' | head -20Repository: UbiquitousLearning/mllm
Length of output: 2644
🏁 Script executed:
# Check if kUInt16 is used elsewhere for similar purposes
rg -n 'kUInt16' --type=cppRepository: UbiquitousLearning/mllm
Length of output: 2868
Change causal_mask dtype from kUInt16 to kFloat32 (line 60).
The causal_mask dtype is inconsistent with all other models in the codebase. Every causal_mask initialization—across LLaMA, Qwen, and custom ops—uses kFloat32. The kUInt16 choice appears to be an error and should be corrected to align with established patterns.
🤖 Prompt for AI Agents
In examples/qwen3_qnn_aot/compile.cpp around lines 59-60, the causal_mask is
created with dtype mllm::kUInt16 which is inconsistent; change the tensor
initialization to use mllm::kFloat32 instead so the causal_mask matches other
models (LLaMA, Qwen, custom ops) and maintains dtype consistency across the
codebase.
| return ir::IRWriter::WalkResult::WALK_CONTINUE; | ||
| }); | ||
|
|
||
| // Visit all graphs and assign names to unnamed operations |
There was a problem hiding this comment.
Incorrect comment copied from OpNamingPass.
The comment "Visit all graphs and assign names to unnamed operations" is incorrect for this pass, which marks tensor inputs and outputs. This appears to be copy-pasted from OpNamingPass.cpp without updating.
🔎 Proposed fix
- // Visit all graphs and assign names to unnamed operations
+ // Mark tensor inputs and outputs for the main CallGraphOp🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/MarkTensorIO.cpp around line 44, the comment
"Visit all graphs and assign names to unnamed operations" is incorrect and was
copy-pasted from OpNamingPass; update the comment to accurately describe this
pass's purpose (marking tensor inputs and outputs), e.g., replace the line with
a concise description such as "Visit graphs and mark tensor inputs and outputs"
or similar wording that reflects the pass behavior.
| // Tag the outputs to call_graph to "qnn_graph_outputs" | ||
| { | ||
| auto outputs = call_main_graph_op->outputs(); | ||
| for (auto o : outputs) { | ||
| if (o->isa_<ir::tensor::TensorValue>()) { | ||
| // That means after this graph, we have a lm_head! | ||
| // o 's outputs is cf.Return and lm_head(linear op) | ||
| if (!o->outputs().empty() && o->outputs().size() == 2 && o->outputs().front()->isa_<ir::cf::ReturnOp>() | ||
| && (*std::next(o->outputs().begin()))->isa_<ir::linalg::LinearOp>()) { | ||
| auto lm_head_ir = (*std::next(o->outputs().begin()))->cast_<ir::linalg::LinearOp>(); | ||
| if (lm_head_ir->outputs().front()->outputs().empty()) { | ||
| lm_head_ir->outputs().front()->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true)); | ||
| } | ||
| } else { | ||
| o->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true)); | ||
| } | ||
| } else { | ||
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "Expect pure tensor value outputs to graph.CallGraphOp"); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Clarify the lm_head special case logic.
The output tagging logic contains a complex conditional (lines 101-106) that handles a special lm_head scenario. The logic is difficult to follow and lacks explanatory comments:
- The condition at lines 101-102 uses both
front()andstd::next()to access elements, which is not immediately clear. - The check at line 104 (
lm_head_ir->outputs().front()->outputs().empty()) lacks explanation of its purpose. - The overall lm_head special case would benefit from a comment explaining why this handling is needed.
🔎 Proposed improvements
Add explanatory comments and consider using intermediate variables for clarity:
// Tag the outputs to call_graph to "qnn_graph_outputs"
{
auto outputs = call_main_graph_op->outputs();
for (auto o : outputs) {
if (o->isa_<ir::tensor::TensorValue>()) {
- // That means after this graph, we have a lm_head!
- // o 's outputs is cf.Return and lm_head(linear op)
+ // Special case: if this graph output feeds both a ReturnOp and a lm_head LinearOp,
+ // mark the lm_head's output as the graph output instead (for ops on QNN)
if (!o->outputs().empty() && o->outputs().size() == 2 && o->outputs().front()->isa_<ir::cf::ReturnOp>()
&& (*std::next(o->outputs().begin()))->isa_<ir::linalg::LinearOp>()) {
auto lm_head_ir = (*std::next(o->outputs().begin()))->cast_<ir::linalg::LinearOp>();
+ // Only tag lm_head output if it's the final output (no further consumers)
if (lm_head_ir->outputs().front()->outputs().empty()) {
lm_head_ir->outputs().front()->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true));
}
} else {
o->setAttr("qnn_graph_outputs", getCtx()->create<ir::BoolAttr>(true));
}Or use intermediate variables for better readability:
if (!o->outputs().empty() && o->outputs().size() == 2) {
+ auto output_consumers = o->outputs();
+ auto first_consumer = output_consumers.front();
+ auto second_consumer = *std::next(output_consumers.begin());
+
- if (o->outputs().front()->isa_<ir::cf::ReturnOp>()
- && (*std::next(o->outputs().begin()))->isa_<ir::linalg::LinearOp>()) {
+ if (first_consumer->isa_<ir::cf::ReturnOp>() && second_consumer->isa_<ir::linalg::LinearOp>()) {
- auto lm_head_ir = (*std::next(o->outputs().begin()))->cast_<ir::linalg::LinearOp>();
+ auto lm_head_ir = second_consumer->cast_<ir::linalg::LinearOp>();| case kInt16: *(rhs_tensor.ptr<int16_t>()) = v; break; | ||
| case kInt8: *(rhs_tensor.ptr<int8_t>()) = v; break; | ||
| default: NYI("Type is not supported"); break; | ||
| case kUInt16: *(rhs_tensor.ptr<uint16_t>()) = v; break; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -name "Tensor.cpp" -type f 2>/dev/null | head -5Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# First, let's understand the file structure
git ls-files | grep -i tensor | head -20Repository: UbiquitousLearning/mllm
Length of output: 882
🏁 Script executed:
# Read the Tensor.cpp file to examine the relevant lines
wc -l mllm/core/Tensor.cppRepository: UbiquitousLearning/mllm
Length of output: 91
🏁 Script executed:
# Look at the equal(float v) method around line 369
sed -n '357,377p' mllm/core/Tensor.cppRepository: UbiquitousLearning/mllm
Length of output: 1058
🏁 Script executed:
# Look at the scalar operators (lines 148-232)
sed -n '148,180p' mllm/core/Tensor.cppRepository: UbiquitousLearning/mllm
Length of output: 1661
🏁 Script executed:
# Search for kUInt16 support across all scalar operator methods
rg -n 'case kUInt16' mllm/core/Tensor.cppRepository: UbiquitousLearning/mllm
Length of output: 132
🏁 Script executed:
# Check the complete pattern of type handling in scalar operators
sed -n '190,232p' mllm/core/Tensor.cppRepository: UbiquitousLearning/mllm
Length of output: 1982
🏁 Script executed:
rg -n 'kUInt16' mllm/ --type cppRepository: UbiquitousLearning/mllm
Length of output: 2758
🏁 Script executed:
# Check if there are any comments or documentation about kUInt16 support
rg -n 'UInt16' mllm/ --type cpp -B2 -A2 | head -50Repository: UbiquitousLearning/mllm
Length of output: 4216
🏁 Script executed:
# Look at the type enum definition to understand kUInt16
fd 'DataTypes' mllm/ --type fRepository: UbiquitousLearning/mllm
Length of output: 114
🏁 Script executed:
# Check the complete operator+ implementation to compare
sed -n '148,185p' mllm/core/Tensor.cppRepository: UbiquitousLearning/mllm
Length of output: 1819
*Add kUInt16 support to scalar operators (+, -, , /) to match equal() method support.
The equal(float v) method now supports kUInt16, but the scalar arithmetic operators (operator+, operator-, operator*, operator/) do not. Since kUInt16 is a supported type used throughout the codebase, users will encounter NYI("Type is not supported") runtime failures if they attempt scalar arithmetic operations on uint16 tensors. Either add kUInt16 handling to all scalar operators to match equal(), or remove it from equal() if uint16 scalar operations are intentionally unsupported.
Additionally, improve error messages in scalar operators to include the unsupported type name like equal() does: use NYI("Type is not supported {}", nameOfType(dtype())) instead of NYI("Type is not supported").
🤖 Prompt for AI Agents
In mllm/core/Tensor.cpp around line 369, the scalar arithmetic operators (+, -,
*, /) lack kUInt16 handling and their NYI errors lack the dtype name; update
each scalar operator's switch to add a kUInt16 case mirroring equal(float v)
(cast/use uint16_t appropriately when assigning results) and replace NYI("Type
is not supported") with NYI("Type is not supported {}", nameOfType(dtype())) so
unsupported-type errors include the type name.
Summary by CodeRabbit
New Features
Refactor
Chores
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.