Skip to content

feat(qualcomm): PTQPass add constant ptq impl.#593

Merged
chenghuaWang merged 2 commits intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main
Jan 9, 2026
Merged

feat(qualcomm): PTQPass add constant ptq impl.#593
chenghuaWang merged 2 commits intoUbiquitousLearning:mainfrom
chenghuaWang:wch-main

Conversation

@chenghuaWang
Copy link
Copy Markdown
Collaborator

@chenghuaWang chenghuaWang commented Jan 8, 2026

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved quantization handling for constant tensors across asymmetric and symmetric quantization modes, ensuring proper value computation and bounds checking during the quantization process.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 8, 2026

📝 Walkthrough

Walkthrough

Runtime PTQ quantization handling added for constant tensors in the PTQPass. The enhancement processes constants through AsymPerTensor and SymPerTensor quantization paths, computing PTQ-quantized values using scale and zero_point parameters, clamping to quant_min/quant_max bounds, and persisting results to both attribute data and tensor storage.

Changes

Cohort / File(s) Summary
PTQ Constant Tensor Quantization
mllm/backends/qnn/aot/passes/PTQPass.cpp
Adds runtime PTQ handling for constant tensors in _recursiveSolveNormalImpl. Implements separate branches for AsymPerTensor (using scale and zero_point) and SymPerTensor (using scale only) quantization paths. Computes PTQ-quantized values, clamps to valid range, and writes results to both constant attribute data (VectorFP32Attr/VectorInt16Attr) and corresponding TV tensor storage. (+84 lines)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Whiskers twitching with delight,
Constants quantized, nice and tight,
Scales and zeros, clamped just right,
PTQ magic shines so bright,
Tensors stored both day and night!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning No pull request description was provided by the author, leaving required context and explanation of the changes missing. Add a comprehensive pull request description explaining the changes, their purpose, and impact. Reference the repository's contribution guidelines for the expected format.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding PTQ (Post-Training Quantization) constant implementation to the PTQPass in the Qualcomm backend.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @mllm/backends/qnn/aot/passes/PTQPass.cpp:
- Around line 155-168: The local variable constant_dtype in PTQPass.cpp is never
used after being set; remove the declaration and all assignments to
constant_dtype inside the block that handles tv->getAttr("constant") (the
branches where constant_ir is VectorFP32Attr and VectorInt16Attr) to eliminate
dead code, and apply the same removal for the SymPerTensor branch (the analogous
constant_dtype declaration/assignments in the SymPerTensor handling section) so
no unused constant_dtype remains.
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/PTQPass.cpp (1)

205-241: Extract duplicated constant quantization logic into a helper function.

Lines 205-241 duplicate nearly all logic from lines 154-192 (the AsymPerTensor branch), with the only meaningful difference being the absence of zero_point in the quantization formula. This violates the DRY principle and increases maintenance burden.

Consider extracting a helper function like:

void quantizeConstantTensor(
    const ir::tensor::TensorValue::ptr_t& tv,
    const Tensor& scale,
    const Tensor* zero_point,  // nullptr for symmetric
    int32_t quant_min,
    int32_t quant_max) {
  // Unified logic for reading, quantizing, and writing constant values
}

Then call it from both branches:

case kAsymPerTensor:
  quantizeConstantTensor(tv, scale, &zero_point, min_v, max_v);
  break;
case kSymPerTensor:
  quantizeConstantTensor(tv, scale, nullptr, min_v, max_v);
  break;
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc11090 and e26e61b.

📒 Files selected for processing (1)
  • mllm/backends/qnn/aot/passes/PTQPass.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/passes/PTQPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/passes/PTQPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/PTQPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/passes/PTQPass.cpp
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/PTQPass.cpp

[error] 156-156: variable 'constant_dtype' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 160-160: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 165-165: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 184-184: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 188-188: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 207-207: variable 'constant_dtype' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 211-211: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 216-216: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 233-233: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 237-237: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-macos
  • GitHub Check: build-x86
  • GitHub Check: build-android
🔇 Additional comments (2)
mllm/backends/qnn/aot/passes/PTQPass.cpp (2)

183-186: Verify type safety when storing quantized values in float32 containers.

The code writes the quantized integer value (ptq_constant_v, type mllm_int32_t) into a VectorFP32Attr and accesses the tensor as mllm_fp32_t. While implicit conversion from int32 to float32 will occur, this may not preserve the intended semantics of storing a quantized integer. After quantization, values are typically stored in integer-typed containers.

Please confirm this is the intended design and that downstream code correctly interprets these float-typed containers as holding quantized integer values.


188-190: Ensure quant_min and quant_max are within int16 range.

When writing ptq_constant_v (int32) to VectorInt16Attr and tensor_.at<mllm_int16_t>(), narrowing conversion occurs. While clamping to [quant_min, quant_max] should prevent truncation, verify that these bounds are correctly configured to fit within int16 range [-32768, 32767] for 16-bit quantization specs.

Comment on lines +155 to +168
double constant_v = 0;
DataTypes constant_dtype = kFloat32;
if (tv->getAttr("constant")) {
auto constant_ir = tv->getAttr("constant");
if (constant_ir->isa_<ir::VectorFP32Attr>()) {
auto ci = constant_ir->cast_<ir::VectorFP32Attr>();
MLLM_RT_ASSERT_EQ(ci->data().size(), 1);
constant_v = ci->data()[0];
constant_dtype = kFloat32;
} else if (constant_ir->isa_<ir::VectorInt16Attr>()) {
auto ci = constant_ir->cast_<ir::VectorInt16Attr>();
MLLM_RT_ASSERT_EQ(ci->data().size(), 1);
constant_v = ci->data()[0];
constant_dtype = kInt16;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused variable constant_dtype.

The variable constant_dtype is assigned on lines 163 and 168 but never used afterward. This is dead code that should be removed unless there's a planned future use.

🧹 Proposed cleanup
 // Check if this tensor is constant tensor. Then we need to quantize it.
 double constant_v = 0;
-DataTypes constant_dtype = kFloat32;
 if (tv->getAttr("constant")) {
   auto constant_ir = tv->getAttr("constant");
   if (constant_ir->isa_<ir::VectorFP32Attr>()) {
     auto ci = constant_ir->cast_<ir::VectorFP32Attr>();
     MLLM_RT_ASSERT_EQ(ci->data().size(), 1);
     constant_v = ci->data()[0];
-    constant_dtype = kFloat32;
   } else if (constant_ir->isa_<ir::VectorInt16Attr>()) {
     auto ci = constant_ir->cast_<ir::VectorInt16Attr>();
     MLLM_RT_ASSERT_EQ(ci->data().size(), 1);
     constant_v = ci->data()[0];
-    constant_dtype = kInt16;
   } else {

Apply the same cleanup to lines 206-219 in the SymPerTensor branch.

🧰 Tools
🪛 Clang (14.0.6)

[error] 156-156: variable 'constant_dtype' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 160-160: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 165-165: variable name 'ci' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/passes/PTQPass.cpp around lines 155 - 168, The local
variable constant_dtype in PTQPass.cpp is never used after being set; remove the
declaration and all assignments to constant_dtype inside the block that handles
tv->getAttr("constant") (the branches where constant_ir is VectorFP32Attr and
VectorInt16Attr) to eliminate dead code, and apply the same removal for the
SymPerTensor branch (the analogous constant_dtype declaration/assignments in the
SymPerTensor handling section) so no unused constant_dtype remains.

Copy link
Copy Markdown
Owner

@UbiquitousLearning UbiquitousLearning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants