[Step1 ]new architecture for auto_round#1542
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors AutoRound toward a new “context + compressor + algorithm” architecture, introducing new compressors_new/ and context/ modules and updating scheme parsing/export helpers to support the new flow.
Changes:
- Added new context singletons (
ModelContext,CompressContext) and a newcompressors_newimplementation path. - Expanded scheme parsing to reconcile
bits/data_typeand support user overrides + AutoScheme integration. - Added new calibration utilities and algorithm scaffolding for quantization backends (AutoRound/RTN).
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| auto_round/utils/model.py | Avoids runtime import cycles via TYPE_CHECKING for QuantizationScheme. |
| auto_round/schemes.py | Adds scheme override + parsing helpers and bits/dtype reconciliation. |
| auto_round/formats.py | Switches divisibility checks to global supported-layer constants. |
| auto_round/context/model_context.py | Introduces model lifecycle/loading + AMP setup and forward-hook management. |
| auto_round/context/compress_context.py | Introduces device/device_map and memory-usage knobs as shared context. |
| auto_round/context/base.py | Adds simple singleton context base. |
| auto_round/context/init.py | Package init for new context module. |
| auto_round/compressors_new/utils.py | New utility module (layer config, gguf mapping, caching helpers, forward helpers). |
| auto_round/compressors_new/shard_writer.py | New shard-based saver with optional safetensors support. |
| auto_round/compressors_new/config.py | Introduces extra/legacy config dataclasses for the new compressor path. |
| auto_round/compressors_new/base.py | New “BaseCompressor” implementation wiring contexts, formats, caching, quant loop. |
| auto_round/compressors_new/init.py | Package init for compressors_new. |
| auto_round/compressors/utils.py | Extends legacy layer-config resolution to include safetensors-only tensors and skip missing modules. |
| auto_round/calibration/utils.py | Adds helpers for “early stop” caching and input reshaping for block tuning. |
| auto_round/calibration/init.py | Package init for calibration. |
| auto_round/algorithms/quantization/rtn/rtn.py | Adds placeholder RTN quantization module file. |
| auto_round/algorithms/quantization/rtn/config.py | Adds RTN algorithm config stub. |
| auto_round/algorithms/quantization/rtn/init.py | Package init for RTN quantization. |
| auto_round/algorithms/quantization/base.py | Adds base quantization class stub. |
| auto_round/algorithms/quantization/auto_round/quantize.py | Adds new AutoRound quantizer implementation (algorithm object). |
| auto_round/algorithms/quantization/auto_round/config.py | Adds new AutoRound algorithm config. |
| auto_round/algorithms/quantization/auto_round/init.py | Package init for AutoRound quantization algorithm. |
| auto_round/algorithms/quantization/init.py | Package init for quantization algorithms. |
| auto_round/algorithms/base.py | Adds base algorithm stub. |
| auto_round/algorithms/alg_config.py | Adds base algorithm config stub. |
| auto_round/algorithms/init.py | Package init for algorithms. |
Contributor
|
If there is already an algorithm folder, what is the purpose of the compressor folder? |
wenhuach21
reviewed
Mar 13, 2026
This was referenced Mar 17, 2026
Closed
Signed-off-by: n1ck-guo <heng.guo@intel.com>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Contributor
Author
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
RotationConfig.block_size defaults to None (meaning 'unset / auto'). When build_hadamard_transform passes block_size=None explicitly, it overrides the __init__ default of 32, causing math.sqrt(None) to raise TypeError. Fix: fall back to 32 when block_size is None.
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Contributor
Author
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…uo/new_ar_arch # Conflicts: # auto_round/utils/model.py
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Contributor
Author
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: n1ck-guo <heng.guo@intel.com>
- DiffusionMixin.__init__: when iters>0, force batch_size=1 and fold the batch into gradient_accumulate_steps, patching both kwargs and the AlgConfig (same pattern as MLLMMixin), matching old DiffusionCompressor.__init__. - DiffusionMixin.__init__ (post-super): unconditionally pipe.to(model.dtype) to align VAE/text-encoder with transformer dtype, mirroring DiffusionCompressor._align_device_and_dtype. Equality check on pipe.dtype is unreliable because it only reflects the primary component. - BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS: populate Flux/OvisImage block output mappings, matching old-arch compressors/diffusion/compressor.py. - Revert new-arch-only logic (_get_pipeline_call_kwargs, height/width, output_type=latent injection) to keep parity with old arch.
Contributor
Author
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.
Usage of new api:
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting