feat(qualcomm): PTQPass add constant ptq impl.#593
feat(qualcomm): PTQPass add constant ptq impl.#593chenghuaWang merged 2 commits intoUbiquitousLearning:mainfrom
Conversation
📝 WalkthroughWalkthroughRuntime PTQ quantization handling added for constant tensors in the PTQPass. The enhancement processes constants through AsymPerTensor and SymPerTensor quantization paths, computing PTQ-quantized values using scale and zero_point parameters, clamping to quant_min/quant_max bounds, and persisting results to both attribute data and tensor storage. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @mllm/backends/qnn/aot/passes/PTQPass.cpp:
- Around line 155-168: The local variable constant_dtype in PTQPass.cpp is never
used after being set; remove the declaration and all assignments to
constant_dtype inside the block that handles tv->getAttr("constant") (the
branches where constant_ir is VectorFP32Attr and VectorInt16Attr) to eliminate
dead code, and apply the same removal for the SymPerTensor branch (the analogous
constant_dtype declaration/assignments in the SymPerTensor handling section) so
no unused constant_dtype remains.
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/PTQPass.cpp (1)
205-241: Extract duplicated constant quantization logic into a helper function.Lines 205-241 duplicate nearly all logic from lines 154-192 (the AsymPerTensor branch), with the only meaningful difference being the absence of
zero_pointin the quantization formula. This violates the DRY principle and increases maintenance burden.Consider extracting a helper function like:
void quantizeConstantTensor( const ir::tensor::TensorValue::ptr_t& tv, const Tensor& scale, const Tensor* zero_point, // nullptr for symmetric int32_t quant_min, int32_t quant_max) { // Unified logic for reading, quantizing, and writing constant values }Then call it from both branches:
case kAsymPerTensor: quantizeConstantTensor(tv, scale, &zero_point, min_v, max_v); break; case kSymPerTensor: quantizeConstantTensor(tv, scale, nullptr, min_v, max_v); break;
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
mllm/backends/qnn/aot/passes/PTQPass.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/passes/PTQPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/passes/PTQPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/passes/PTQPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/passes/PTQPass.cpp
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/PTQPass.cpp
[error] 156-156: variable 'constant_dtype' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 160-160: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 165-165: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 184-184: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 188-188: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 207-207: variable 'constant_dtype' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 211-211: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 216-216: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 233-233: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 237-237: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-macos
- GitHub Check: build-x86
- GitHub Check: build-android
🔇 Additional comments (2)
mllm/backends/qnn/aot/passes/PTQPass.cpp (2)
183-186: Verify type safety when storing quantized values in float32 containers.The code writes the quantized integer value (
ptq_constant_v, typemllm_int32_t) into aVectorFP32Attrand accesses the tensor asmllm_fp32_t. While implicit conversion from int32 to float32 will occur, this may not preserve the intended semantics of storing a quantized integer. After quantization, values are typically stored in integer-typed containers.Please confirm this is the intended design and that downstream code correctly interprets these float-typed containers as holding quantized integer values.
188-190: Ensurequant_minandquant_maxare within int16 range.When writing
ptq_constant_v(int32) toVectorInt16Attrandtensor_.at<mllm_int16_t>(), narrowing conversion occurs. While clamping to[quant_min, quant_max]should prevent truncation, verify that these bounds are correctly configured to fit within int16 range [-32768, 32767] for 16-bit quantization specs.
| double constant_v = 0; | ||
| DataTypes constant_dtype = kFloat32; | ||
| if (tv->getAttr("constant")) { | ||
| auto constant_ir = tv->getAttr("constant"); | ||
| if (constant_ir->isa_<ir::VectorFP32Attr>()) { | ||
| auto ci = constant_ir->cast_<ir::VectorFP32Attr>(); | ||
| MLLM_RT_ASSERT_EQ(ci->data().size(), 1); | ||
| constant_v = ci->data()[0]; | ||
| constant_dtype = kFloat32; | ||
| } else if (constant_ir->isa_<ir::VectorInt16Attr>()) { | ||
| auto ci = constant_ir->cast_<ir::VectorInt16Attr>(); | ||
| MLLM_RT_ASSERT_EQ(ci->data().size(), 1); | ||
| constant_v = ci->data()[0]; | ||
| constant_dtype = kInt16; |
There was a problem hiding this comment.
Remove unused variable constant_dtype.
The variable constant_dtype is assigned on lines 163 and 168 but never used afterward. This is dead code that should be removed unless there's a planned future use.
🧹 Proposed cleanup
// Check if this tensor is constant tensor. Then we need to quantize it.
double constant_v = 0;
-DataTypes constant_dtype = kFloat32;
if (tv->getAttr("constant")) {
auto constant_ir = tv->getAttr("constant");
if (constant_ir->isa_<ir::VectorFP32Attr>()) {
auto ci = constant_ir->cast_<ir::VectorFP32Attr>();
MLLM_RT_ASSERT_EQ(ci->data().size(), 1);
constant_v = ci->data()[0];
- constant_dtype = kFloat32;
} else if (constant_ir->isa_<ir::VectorInt16Attr>()) {
auto ci = constant_ir->cast_<ir::VectorInt16Attr>();
MLLM_RT_ASSERT_EQ(ci->data().size(), 1);
constant_v = ci->data()[0];
- constant_dtype = kInt16;
} else {Apply the same cleanup to lines 206-219 in the SymPerTensor branch.
🧰 Tools
🪛 Clang (14.0.6)
[error] 156-156: variable 'constant_dtype' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 160-160: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 165-165: variable name 'ci' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/passes/PTQPass.cpp around lines 155 - 168, The local
variable constant_dtype in PTQPass.cpp is never used after being set; remove the
declaration and all assignments to constant_dtype inside the block that handles
tv->getAttr("constant") (the branches where constant_ir is VectorFP32Attr and
VectorInt16Attr) to eliminate dead code, and apply the same removal for the
SymPerTensor branch (the analogous constant_dtype declaration/assignments in the
SymPerTensor handling section) so no unused constant_dtype remains.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.