[Step1 ]new architecture for auto_round by n1ck-guo · Pull Request #1542 · intel/auto-round

n1ck-guo · 2026-03-13T02:08:50Z

Description

Compressor:
Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.
Calibration: Handles the calibration process (Work in Progress)
Context: Manages shared configurations and model states throughout the quantization pipeline, providing centralized control to prevent cross-module dependencies
- ModelContext: Handles model loading and tracks model states and relevant configurations
- CompressContext: Stores shared compression settings such as low_cpu_mem_usage, enable_torch_compile, etc.
Algorithms: Concrete quantization and weight transformation implementations
- Quantization: Various quantization algorithms, including AutoRound, RTN, OptRTN, etc.
- Transform: Weight transformation algorithms such as Hadamard transform

Usage of new api:

from auto_round.algorithms.rotation import HadamardConfig 
from auto_round.compressor_new import AutoRound

quant_cfg  = AutoRoundConfig(bits=4, group_size=128, iters=200)
had_cfg_1  = HadamardConfig(hadamard_type="hadamard",        block_size=32)

compressor = AutoRound(
    alg_configs=[quant_cfg, had_cfg_1], 
    model="facebook/opt-125m",
    scheme="MXFP4",
    format="auto_round",
)

model, layer_config = compressor.quantize_and_save(
    output_dir="./output",
)

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot

Pull request overview

Refactors AutoRound toward a new “context + compressor + algorithm” architecture, introducing new compressors_new/ and context/ modules and updating scheme parsing/export helpers to support the new flow.

Changes:

Added new context singletons (ModelContext, CompressContext) and a new compressors_new implementation path.
Expanded scheme parsing to reconcile bits/data_type and support user overrides + AutoScheme integration.
Added new calibration utilities and algorithm scaffolding for quantization backends (AutoRound/RTN).

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
auto_round/utils/model.py	Avoids runtime import cycles via `TYPE_CHECKING` for `QuantizationScheme`.
auto_round/schemes.py	Adds scheme override + parsing helpers and bits/dtype reconciliation.
auto_round/formats.py	Switches divisibility checks to global supported-layer constants.
auto_round/context/model_context.py	Introduces model lifecycle/loading + AMP setup and forward-hook management.
auto_round/context/compress_context.py	Introduces device/device_map and memory-usage knobs as shared context.
auto_round/context/base.py	Adds simple singleton context base.
auto_round/context/init.py	Package init for new `context` module.
auto_round/compressors_new/utils.py	New utility module (layer config, gguf mapping, caching helpers, forward helpers).
auto_round/compressors_new/shard_writer.py	New shard-based saver with optional safetensors support.
auto_round/compressors_new/config.py	Introduces extra/legacy config dataclasses for the new compressor path.
auto_round/compressors_new/base.py	New “BaseCompressor” implementation wiring contexts, formats, caching, quant loop.
auto_round/compressors_new/init.py	Package init for `compressors_new`.
auto_round/compressors/utils.py	Extends legacy layer-config resolution to include safetensors-only tensors and skip missing modules.
auto_round/calibration/utils.py	Adds helpers for “early stop” caching and input reshaping for block tuning.
auto_round/calibration/init.py	Package init for `calibration`.
auto_round/algorithms/quantization/rtn/rtn.py	Adds placeholder RTN quantization module file.
auto_round/algorithms/quantization/rtn/config.py	Adds RTN algorithm config stub.
auto_round/algorithms/quantization/rtn/init.py	Package init for RTN quantization.
auto_round/algorithms/quantization/base.py	Adds base quantization class stub.
auto_round/algorithms/quantization/auto_round/quantize.py	Adds new AutoRound quantizer implementation (algorithm object).
auto_round/algorithms/quantization/auto_round/config.py	Adds new AutoRound algorithm config.
auto_round/algorithms/quantization/auto_round/init.py	Package init for AutoRound quantization algorithm.
auto_round/algorithms/quantization/init.py	Package init for quantization algorithms.
auto_round/algorithms/base.py	Adds base algorithm stub.
auto_round/algorithms/alg_config.py	Adds base algorithm config stub.
auto_round/algorithms/init.py	Package init for algorithms.

wenhuach21 · 2026-03-13T02:16:59Z

If there is already an algorithm folder, what is the purpose of the compressor folder?

…uo/new_ar_arch

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…uo/new_ar_arch

Signed-off-by: n1ck-guo <heng.guo@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo · 2026-04-23T06:22:26Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-23T06:22:36Z

Azure Pipelines successfully started running 1 pipeline(s).

RotationConfig.block_size defaults to None (meaning 'unset / auto'). When build_hadamard_transform passes block_size=None explicitly, it overrides the __init__ default of 32, causing math.sqrt(None) to raise TypeError. Fix: fall back to 32 when block_size is None.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…uo/new_ar_arch

n1ck-guo · 2026-04-27T02:11:46Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-27T02:11:56Z

Azure Pipelines successfully started running 1 pipeline(s).

…uo/new_ar_arch # Conflicts: # auto_round/utils/model.py

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo · 2026-04-27T06:49:39Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-27T06:49:49Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>

- DiffusionMixin.__init__: when iters>0, force batch_size=1 and fold the batch into gradient_accumulate_steps, patching both kwargs and the AlgConfig (same pattern as MLLMMixin), matching old DiffusionCompressor.__init__. - DiffusionMixin.__init__ (post-super): unconditionally pipe.to(model.dtype) to align VAE/text-encoder with transformer dtype, mirroring DiffusionCompressor._align_device_and_dtype. Equality check on pipe.dtype is unreliable because it only reflects the primary component. - BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS: populate Flux/OvisImage block output mappings, matching old-arch compressors/diffusion/compressor.py. - Revert new-arch-only logic (_get_pipeline_call_kwargs, height/width, output_type=latent injection) to keep parity with old arch.

n1ck-guo · 2026-04-27T09:07:05Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-27T09:07:16Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo · 2026-04-27T14:49:46Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-27T14:49:58Z

Azure Pipelines successfully started running 1 pipeline(s).

n1ck-guo · 2026-04-28T01:32:26Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-28T01:32:38Z

Azure Pipelines successfully started running 1 pipeline(s).

init

7698b93

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo requested review from Copilot, lkk12014402, lvliang-intel, wenhuach21 and xin3he March 13, 2026 02:08

n1ck-guo added the draft label Mar 13, 2026

Copilot started reviewing on behalf of n1ck-guo March 13, 2026 02:09 View session

n1ck-guo added the engineering label Mar 13, 2026

Copilot AI reviewed Mar 13, 2026

View reviewed changes

wenhuach21 reviewed Mar 13, 2026

View reviewed changes

Comment thread auto_round/compressors_new/base.py Outdated

n1ck-guo requested review from WeiweiZhang1 and yiliu30 and removed request for xin3he March 13, 2026 05:31

n1ck-guo added 3 commits March 13, 2026 14:00

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

75b4141

…uo/new_ar_arch

update

ca17097

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

a092e37

…uo/new_ar_arch

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

Comment thread auto_round/compressors_new/base.py Outdated

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

Comment thread auto_round/algorithms/quantization/auto_round/quantize.py Outdated

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

Comment thread auto_round/algorithms/alg_config.py

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

Comment thread auto_round/compressors_new/config.py

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

Comment thread auto_round/algorithms/quantization/auto_round/quantize.py Outdated

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

cec4ce4

…uo/new_ar_arch

chensuyue added this to the 0.12.0 milestone Mar 16, 2026

This was referenced Mar 17, 2026

decouple quanitzers #787

Closed

Refactor collection for v0.13.0 release #1134

Open

n1ck-guo and others added 3 commits March 17, 2026 17:02

update

e265b8f

Signed-off-by: n1ck-guo <heng.guo@intel.com>

merge main

868a82d

Signed-off-by: n1ck-guo <heng.guo@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9dc930c

for more information, see https://pre-commit.ci

pre-commit-ci Bot and others added 4 commits April 23, 2026 05:29

[pre-commit.ci] auto fixes from pre-commit.com hooks

d5c8e9b

for more information, see https://pre-commit.ci

merge and sync main

aa4c540

Signed-off-by: n1ck-guo <heng.guo@intel.com>

clean

4c78670

Signed-off-by: n1ck-guo <heng.guo@intel.com>

clean

e4e025e

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo added 6 commits April 23, 2026 15:08

fix cuda ut

110e9ea

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge remote-tracking branch 'origin/main' into hengguo/new_ar_arch

baf6bd3

merge and sync main

c97428c

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

bad859a

…uo/new_ar_arch

sync: add xpu sdpa patch and AutoScheme VLM support to new arch

441d060

n1ck-guo added 2 commits April 27, 2026 13:57

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

dd7b7f0

…uo/new_ar_arch # Conflicts: # auto_round/utils/model.py

merge main

efe2c74

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo added 2 commits April 27, 2026 16:25

fix diffusion ut

26aa106

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix

2fe5f03

Signed-off-by: n1ck-guo <heng.guo@intel.com>

chensuyue merged commit 91585c7 into main Apr 28, 2026
40 of 42 checks passed

chensuyue deleted the hengguo/new_ar_arch branch April 28, 2026 08:15

chensuyue mentioned this pull request Apr 28, 2026

decouple calibration code #810

Closed

Conversation

n1ck-guo commented Mar 13, 2026 • edited by wenhuach21 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n1ck-guo commented Apr 23, 2026

Uh oh!

azure-pipelines Bot commented Apr 23, 2026

Uh oh!

n1ck-guo commented Apr 27, 2026

Uh oh!

azure-pipelines Bot commented Apr 27, 2026

Uh oh!

n1ck-guo commented Apr 27, 2026

Uh oh!

azure-pipelines Bot commented Apr 27, 2026

Uh oh!

n1ck-guo commented Apr 27, 2026

Uh oh!

azure-pipelines Bot commented Apr 27, 2026

Uh oh!

n1ck-guo commented Apr 27, 2026

Uh oh!

azure-pipelines Bot commented Apr 27, 2026

Uh oh!

n1ck-guo commented Apr 28, 2026

Uh oh!

azure-pipelines Bot commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

n1ck-guo commented Mar 13, 2026 •

edited by wenhuach21

Loading