Skip to content

[Step1 ]new architecture for auto_round#1542

Merged
chensuyue merged 123 commits intomainfrom
hengguo/new_ar_arch
Apr 28, 2026
Merged

[Step1 ]new architecture for auto_round#1542
chensuyue merged 123 commits intomainfrom
hengguo/new_ar_arch

Conversation

@n1ck-guo
Copy link
Copy Markdown
Contributor

@n1ck-guo n1ck-guo commented Mar 13, 2026

Description

  • Compressor:
    Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.
  • Calibration: Handles the calibration process (Work in Progress)
  • Context: Manages shared configurations and model states throughout the quantization pipeline, providing centralized control to prevent cross-module dependencies
    • ModelContext: Handles model loading and tracks model states and relevant configurations
    • CompressContext: Stores shared compression settings such as low_cpu_mem_usage, enable_torch_compile, etc.
  • Algorithms: Concrete quantization and weight transformation implementations
    • Quantization: Various quantization algorithms, including AutoRound, RTN, OptRTN, etc.
    • Transform: Weight transformation algorithms such as Hadamard transform

Usage of new api:

from auto_round.algorithms.rotation import HadamardConfig 
from auto_round.compressor_new import AutoRound

quant_cfg  = AutoRoundConfig(bits=4, group_size=128, iters=200)
had_cfg_1  = HadamardConfig(hadamard_type="hadamard",        block_size=32)

compressor = AutoRound(
    alg_configs=[quant_cfg, had_cfg_1], 
    model="facebook/opt-125m",
    scheme="MXFP4",
    format="auto_round",
)

model, layer_config = compressor.quantize_and_save(
    output_dir="./output",
)

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors AutoRound toward a new “context + compressor + algorithm” architecture, introducing new compressors_new/ and context/ modules and updating scheme parsing/export helpers to support the new flow.

Changes:

  • Added new context singletons (ModelContext, CompressContext) and a new compressors_new implementation path.
  • Expanded scheme parsing to reconcile bits/data_type and support user overrides + AutoScheme integration.
  • Added new calibration utilities and algorithm scaffolding for quantization backends (AutoRound/RTN).

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
auto_round/utils/model.py Avoids runtime import cycles via TYPE_CHECKING for QuantizationScheme.
auto_round/schemes.py Adds scheme override + parsing helpers and bits/dtype reconciliation.
auto_round/formats.py Switches divisibility checks to global supported-layer constants.
auto_round/context/model_context.py Introduces model lifecycle/loading + AMP setup and forward-hook management.
auto_round/context/compress_context.py Introduces device/device_map and memory-usage knobs as shared context.
auto_round/context/base.py Adds simple singleton context base.
auto_round/context/init.py Package init for new context module.
auto_round/compressors_new/utils.py New utility module (layer config, gguf mapping, caching helpers, forward helpers).
auto_round/compressors_new/shard_writer.py New shard-based saver with optional safetensors support.
auto_round/compressors_new/config.py Introduces extra/legacy config dataclasses for the new compressor path.
auto_round/compressors_new/base.py New “BaseCompressor” implementation wiring contexts, formats, caching, quant loop.
auto_round/compressors_new/init.py Package init for compressors_new.
auto_round/compressors/utils.py Extends legacy layer-config resolution to include safetensors-only tensors and skip missing modules.
auto_round/calibration/utils.py Adds helpers for “early stop” caching and input reshaping for block tuning.
auto_round/calibration/init.py Package init for calibration.
auto_round/algorithms/quantization/rtn/rtn.py Adds placeholder RTN quantization module file.
auto_round/algorithms/quantization/rtn/config.py Adds RTN algorithm config stub.
auto_round/algorithms/quantization/rtn/init.py Package init for RTN quantization.
auto_round/algorithms/quantization/base.py Adds base quantization class stub.
auto_round/algorithms/quantization/auto_round/quantize.py Adds new AutoRound quantizer implementation (algorithm object).
auto_round/algorithms/quantization/auto_round/config.py Adds new AutoRound algorithm config.
auto_round/algorithms/quantization/auto_round/init.py Package init for AutoRound quantization algorithm.
auto_round/algorithms/quantization/init.py Package init for quantization algorithms.
auto_round/algorithms/base.py Adds base algorithm stub.
auto_round/algorithms/alg_config.py Adds base algorithm config stub.
auto_round/algorithms/init.py Package init for algorithms.

Comment thread auto_round/compressors_new/utils.py
Comment thread auto_round/compressors_new/base.py Outdated
Comment thread auto_round/compressors_new/shard_writer.py
Comment thread auto_round/algorithms/quantization/base.py Outdated
Comment thread auto_round/context/model_context.py Outdated
Comment thread auto_round/algorithms/quantization/auto_round/quantize.py Outdated
Comment thread auto_round/algorithms/quantization/auto_round/config.py Outdated
Comment thread auto_round/context/model.py
Comment thread auto_round/schemes.py
Comment thread auto_round/algorithms/quantization/auto_round/quantize.py Outdated
@wenhuach21
Copy link
Copy Markdown
Contributor

If there is already an algorithm folder, what is the purpose of the compressor folder?

Comment thread auto_round/compressors_new/base.py Outdated
@n1ck-guo n1ck-guo requested review from WeiweiZhang1 and yiliu30 and removed request for xin3he March 13, 2026 05:31
Comment thread auto_round/compressors_new/base.py Outdated
Comment thread auto_round/algorithms/quantization/auto_round/quantize.py Outdated
Comment thread auto_round/algorithms/alg_config.py
Comment thread auto_round/compressors_new/config.py
Comment thread auto_round/algorithms/quantization/auto_round/quantize.py Outdated
@chensuyue chensuyue added this to the 0.12.0 milestone Mar 16, 2026
n1ck-guo and others added 3 commits March 17, 2026 17:02
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
pre-commit-ci Bot and others added 4 commits April 23, 2026 05:29
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

RotationConfig.block_size defaults to None (meaning 'unset / auto').
When build_hadamard_transform passes block_size=None explicitly, it
overrides the __init__ default of 32, causing math.sqrt(None) to raise
TypeError.

Fix: fall back to 32 when block_size is None.
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

…uo/new_ar_arch

# Conflicts:
#	auto_round/utils/model.py
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>
- DiffusionMixin.__init__: when iters>0, force batch_size=1 and fold
  the batch into gradient_accumulate_steps, patching both kwargs and
  the AlgConfig (same pattern as MLLMMixin), matching old
  DiffusionCompressor.__init__.
- DiffusionMixin.__init__ (post-super): unconditionally pipe.to(model.dtype)
  to align VAE/text-encoder with transformer dtype, mirroring
  DiffusionCompressor._align_device_and_dtype. Equality check on
  pipe.dtype is unreliable because it only reflects the primary component.
- BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS: populate Flux/OvisImage block
  output mappings, matching old-arch compressors/diffusion/compressor.py.
- Revert new-arch-only logic (_get_pipeline_call_kwargs, height/width,
  output_type=latent injection) to keep parity with old arch.
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@chensuyue chensuyue merged commit 91585c7 into main Apr 28, 2026
40 of 42 checks passed
@chensuyue chensuyue deleted the hengguo/new_ar_arch branch April 28, 2026 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api/new engineering ready only add when the PR is ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants