[OpenVINO] NNCF Data-Aware Compression Algorithms Support for OVQuantizer#16002
[OpenVINO] NNCF Data-Aware Compression Algorithms Support for OVQuantizer#16002mergennachin merged 25 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16002
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New FailuresAs of commit 0c82495 with merge base 1550f0c ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR adds support for NNCF data-aware compression algorithms (AWQ and Scale Estimation) to the OpenVINO quantizer. It refactors the quantizer configuration to enable more flexible compression options and introduces a new compression module for LLM calibration.
Key changes:
- Replaces the boolean
nncf_compressionfield with two specific algorithm flags:openvino_awqandopenvino_scale_estimation - Refactors
WEIGHTS_ONLY_COMPRESSION_MODESfrom a tuple to a dictionary for cleaner mode mapping - Adds new methods to expose weight compression configuration and parameters
- Introduces
apply_nncf_data_aware_compressionfunction for data-aware LLM compression - Updates NNCF dependency to use the latest version instead of a specific commit
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| extension/llm/export/config/llm_config.py | Replaces nncf_compression with openvino_awq and openvino_scale_estimation configuration fields |
| examples/models/llama/export_llama_lib.py | Adds CLI arguments for AWQ and scale estimation, integrates new compression function |
| backends/openvino/requirements.txt | Updates NNCF dependency to use latest version from main branch |
| backends/openvino/quantizer/quantizer.py | Refactors compression modes mapping, adds new methods for weight compression config exposure, adds check for null compression configs |
| backends/openvino/quantizer/llm_compression.py | New file implementing data-aware compression with calibration data generation |
| backends/openvino/quantizer/init.py | Exports new apply_nncf_data_aware_compression function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
extension/llm/export/config/llm_config.py:459
- The docstring incorrectly states 'Configures the QNN backend' when this is the
OpenvinoConfigclass. It should say 'Configures the OpenVINO backend.'
"""
Configures the QNN backend.
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@mergennachin the new version of nncf is released and I have updated the requirements.txt accordingly |
Summary
This PR introduces the use of
nncf.compress_pt2e()API which allows users to pass a quantizer object which is compatible with the Torch AO Quantizers API and the Torch FX model. It then returns the model with weights only compression applied to it along with additional algorithms that can be applied from NNCF like AWQ, Scale Estimation, etc.