Fix broken HQQ support by mobicham · Pull Request #45147 · huggingface/transformers

mobicham · 2026-03-31T14:59:47Z

What does this PR do?

This PR fixes hqq support that has been broken for a couple of months now after a refactoring:

Online quantization works fine now.
Serialization to load/save HQQ models is fixed too.

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker @SunMarc

mobicham · 2026-04-07T07:34:00Z

@ArthurZucker @SunMarc a little bump on this, should be an easy fix

github-actions · 2026-04-07T07:34:50Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: hqq

SunMarc

Thanks for looking into that @mobicham ! Since we are adding back the support, can we try to do something a bit more maintainable ? This is also one of the reason we didn't add back the support, it was too complicated. I know that SINQ based a lot of their code on hqq / gemlite and the first PR they added did something similar to here but in the end they manage to clean a lot the integration and now it looks much better. Would you be up to do that ? https://github.com/huggingface/transformers/blob/main/src/transformers/quantizers/quantizer_sinq.py

SunMarc · 2026-04-09T15:52:46Z

    # TODO: to remove
    # def create_quantized_param(
    #     self,
    #     model: "PreTrainedModel",


remove that

SunMarc · 2026-04-09T15:53:03Z

+    def get_weight_conversions(self):
+        return []


we should use deserialize

SunMarc · 2026-04-09T15:53:10Z

    # TODO: to remove
    # Kept here in case we see some interest in adding support for it
    # # Adds missing keys for HQQLinear modules that are loaded but the model with initialized with torch.nn.Linear
    # def update_expected_keys(


SunMarc · 2026-04-09T15:55:59Z

@@ -72,6 +78,7 @@
                logger.info("Setting dtype to torch.float32 as the default value since it was not specified.")


remove that

SunMarc · 2026-04-09T15:57:25Z

+    def update_dtype(self, dtype):
+        if dtype is not None:
+            self.dtype = dtype
+        return dtype


do we really need that, the tensors should be in the right dtype so we should be able to access that directly

SunMarc · 2026-04-09T16:03:15Z

+    def _load_hqq_from_checkpoint(self, model: "PreTrainedModel"):
+        """Load pre-quantized HQQ weights directly from checkpoint files."""
+        from collections import defaultdict
+
+        from safetensors import safe_open
+
+        from ..integrations.hqq import autoname_modules, name_to_linear_tag
+
+        # Determine target device from stored device_map
+        device_map = getattr(self, "device_map", None)
+        if isinstance(device_map, dict):
+            # Use the first non-cpu device from the map (values can be str, int, or torch.device)
+            devices = [torch.device(v) for v in device_map.values()]


why we need that ?

SunMarc · 2026-04-09T16:03:41Z

+    def _setup_missing_key_filters(self, model, checkpoint_files):
+        """Scan checkpoint files to find HQQ-quantized modules.
+
+        For those modules:
+        1. Suppress their .weight missing key warnings in the load report.
+        2. Replace their weight parameter with a scalar meta tensor so that
+           ``_move_missing_keys_from_meta_to_device`` does not allocate
+           full-size fp16 tensors on GPU (which would cause OOM).
+        """
+        import re
+
+        from safetensors import safe_open
+
+        quantized_modules = set()


Can we not do these types of changes ?

SunMarc · 2026-04-09T16:06:23Z

+        quant_config = getattr(module, "quant_config", None)
+        if quant_config is None:
+            # Module is skipped from quantization, just return the weight as-is
+            return {full_layer_name: value}
+
+        # Determine target device and compute dtype
+        target_device = value.device
+        compute_dtype = self.hf_quantizer.dtype
+
+        # Create HQQLinear from the nn.Linear
+        hqq_layer = HQQLinear(
+            module,
+            quant_config=quant_config,
+            compute_dtype=compute_dtype,
+            device=target_device,
+            del_orig=True,
+        )
+
+        if hqq_layer.bias is not None and isinstance(hqq_layer.bias, torch.Tensor):
+            hqq_layer.bias = torch.nn.Parameter(hqq_layer.bias)
+
+        if self.hf_quantizer.using_multi_gpu:


It would be nice if we didn't have to do that here

mobicham and others added 7 commits March 31, 2026 10:11

Fix hqq

cec8546

fix tests

450363d

fix model serialization

f8c299f

fix ci

4d1c5f0

Merge branch 'main' into main

0632a17

Merge branch 'main' into main

183a9ad

Merge branch 'main' into main

555a3aa

SunMarc reviewed Apr 9, 2026

View reviewed changes

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken HQQ support#45147

Fix broken HQQ support#45147
mobicham wants to merge 7 commits intohuggingface:mainfrom
mobicham:main

mobicham commented Mar 31, 2026

Uh oh!

mobicham commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

SunMarc left a comment

Uh oh!

SunMarc Apr 9, 2026

Uh oh!

SunMarc Apr 9, 2026

Uh oh!

SunMarc Apr 9, 2026

Uh oh!

SunMarc Apr 9, 2026

Uh oh!

SunMarc Apr 9, 2026

Uh oh!

SunMarc Apr 9, 2026

Uh oh!

SunMarc Apr 9, 2026

Uh oh!

SunMarc Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -72,6 +78,7 @@
		logger.info("Setting dtype to torch.float32 as the default value since it was not specified.")

Conversation

mobicham commented Mar 31, 2026

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Uh oh!

mobicham commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants