Update `param_element_size` by SunMarc · Pull Request #42818 · huggingface/transformers

SunMarc · 2025-12-11T15:38:38Z

What does this PR do?

This PR fixes how we calculate the param size for quantized models. This should be simpler to hack around !

I added some tests to check that it works for the following methods bnb, finegrained_fp8, torchao, mxfp4, quanto.

cc @Cyrilvallez as you were concerned at some point.

SunMarc · 2025-12-11T15:39:55Z

-                if not is_quantized or not hf_quantizer.param_needs_quantization(self, key):
-                    _load_parameter_into_model(self, key, value)


it should be fine to remove the condition no ? cc @Cyrilvallez

Well, yes and no... Previously there was a

else: hf_quantizer.create_quantized_param(...)

in case the missing weight needed to be quantized (we quantized the new random weight).

I'm not sure why it was removed, but it should probably be the best way no?

We removed create_quantized_param as we didn't need to anymore since we have the quantize ops xD. Let's see in another how to deal with that in another PR cc @MekkCyber but I don't think this urgent.

HuggingFaceDocBuilderDev · 2025-12-11T15:59:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MekkCyber

Awesome 🧼

Cyrilvallez

SUPER NICE! 🤗 Thanks a lot for doing that! High time we finally have a nice and clean way to pre-allocate with quantization! 🚀
Just added a few comments, then we can merge!

Cyrilvallez · 2025-12-12T14:28:04Z

-                if not is_quantized or not hf_quantizer.param_needs_quantization(self, key):
-                    _load_parameter_into_model(self, key, value)


Well, yes and no... Previously there was a

else: hf_quantizer.create_quantized_param(...)

in case the missing weight needed to be quantized (we quantized the new random weight).

I'm not sure why it was removed, but it should probably be the best way no?

SunMarc · 2025-12-12T17:54:58Z

+        else None
+    )
+
+    modules_sizes, _ = compute_module_sizes(model, hf_quantizer, only_modules=False)


using this instead cc @Cyrilvallez if that's fine

SunMarc · 2025-12-12T17:57:20Z

+        # We need parameters + buffers here, as state_dict does not count non-persistent buffers which are taking space
+        expected_keys = [name for name, _ in model.named_parameters()] + [name for name, _ in model.named_buffers()]
+


there were some differences because of this

SunMarc · 2025-12-12T17:58:03Z

+        # check that we get the same value, as we use `compute_module_sizes` in `get_total_byte_count`
+        assert total_byte_count == model_size[""]
+        assert quantized_total_byte_count == quantized_model_size[""]


here i check that we have the same result

Cyrilvallez · 2025-12-12T18:36:40Z

Will have a last look next Monday!

Cyrilvallez · 2025-12-15T16:04:25Z

-        expected_keys = list(model.state_dict().keys())
+        # We need parameters + buffers here, as state_dict does not count non-persistent buffers which are taking space
+        expected_keys = [name for name, _ in model.named_parameters()] + [name for name, _ in model.named_buffers()]


Humm, the fact here is that non-persistent buffers are NOT loaded with the other params (because they are non-persistent of course), so it is not needed to account for them when allocating memory before loading
Thus I believe we should revert this change

indeed xD I was only trying to match the numbers and didn't think too far. In any case, the tests I wrote should still be valid !

Cyrilvallez · 2025-12-15T16:17:24Z

+    modules_sizes, _ = compute_module_sizes(model, hf_quantizer, only_modules=False)
+    for param_name, device in accelerator_device_map.items():


Humm, here we iterate twice over all params for no reason... Better to go back to the old loop and mimic what's being done in compute_module_sizes by using dtype_size = hf_quantizer.param_element_size(model, name, param) if we have a quantizer!

fair, sounds good !

Cyrilvallez

LGTM! Left a final comment but that's it!
Feel free to merge after the conflicts on quantizers have been resolved 🤗
Thanks again for this, will make everything much much smoother! 🤗

Cyrilvallez · 2025-12-18T10:22:17Z

+def get_total_byte_count(
+    model: PreTrainedModel, accelerator_device_map: dict, hf_quantizer: Optional[HfQuantizer] = None
+):
+    """
+    This utility function calculates the total bytes count needed to load the model on each device.
+    This is useful for caching_allocator_warmup as we want to know how much cache we need to pre-allocate.
+    """
+
+    total_byte_count = defaultdict(lambda: 0)
+    tied_param_names = model.all_tied_weights_keys.keys()
+
+    tp_plan = getattr(model, "_tp_plan", []) or []
+    tp_plan_regex = (
+        re.compile("|".join([re.escape(plan) for plan in tp_plan]))
+        if _torch_distributed_available and torch.distributed.is_initialized()
+        else None
+    )
+
+    for param_name, device in accelerator_device_map.items():
+        # Skip if the parameter has already been accounted for (tied weights)
+        if param_name in tied_param_names:
+            continue
+
+        param = model.get_parameter_or_buffer(param_name)
+
+        if hf_quantizer is not None:
+            dtype_size = hf_quantizer.param_element_size(model, param_name, param)
+        else:
+            dtype_size = param.element_size()
+
+        param_byte_count = param.numel() * dtype_size
+
+        if tp_plan_regex is not None:
+            generic_name = re.sub(r"\.\d+\.", ".*.", param_name)
+            param_byte_count //= torch.distributed.get_world_size() if tp_plan_regex.search(generic_name) else 1
+
+        total_byte_count[device] += param_byte_count
+    return total_byte_count


nit: if you don't mind, I find that it's easier to follow if everything is inside the same function in this case, so IMO I would put it back in caching_allocator_warmup
No super strong opinions here though, so if you think you'll ever need it elsewhere we can keep separate

I had a do this as I need to to test that we get the correct allocation ;D

github-actions · 2025-12-18T12:30:14Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bnb, finegrained_fp8, mxfp4, quanto_integration, torchao_integration

* clean * int * check * better * working * remove unrelated stuff * rm print * torchao * Fix * added * fix quanto * revert * reverted * rm comment * fix

clean

a5a25c1

SunMarc changed the title ~~Update~~ Update param_element_size Dec 11, 2025

SunMarc requested a review from MekkCyber December 11, 2025 15:39

SunMarc commented Dec 11, 2025

View reviewed changes

int

401c6e0

MekkCyber approved these changes Dec 11, 2025

View reviewed changes

Comment thread src/transformers/quantizers/base.py

check

1767ba7

Cyrilvallez reviewed Dec 12, 2025

View reviewed changes

SunMarc added 4 commits December 12, 2025 15:06

Merge remote-tracking branch 'origin/main' into clean-param-size

42cea5d

better

fe2a2d3

working

b23c526

remove unrelated stuff

ad93ed0

SunMarc commented Dec 12, 2025

View reviewed changes

rm print

5710bd0

SunMarc requested a review from Cyrilvallez December 12, 2025 17:59

SunMarc added 4 commits December 15, 2025 15:01

torchao

e520f7d

Fix

62470e4

added

54c3719

fix quanto

188de42

Cyrilvallez reviewed Dec 15, 2025

View reviewed changes

SunMarc added 4 commits December 15, 2025 18:36

revert

066e582

reverted

a603a02

rm comment

863e8eb

fix

d9e0246

SunMarc requested a review from Cyrilvallez December 15, 2025 18:41

SunMarc mentioned this pull request Dec 16, 2025

Fix fp8 + some enhancement #42455

Merged

Cyrilvallez approved these changes Dec 18, 2025

View reviewed changes

Merge branch 'main' into clean-param-size

90dccb9

SunMarc merged commit dd8057a into main Dec 18, 2025
26 checks passed

SunMarc deleted the clean-param-size branch December 18, 2025 14:26

		if not is_quantized or not hf_quantizer.param_needs_quantization(self, key):
		_load_parameter_into_model(self, key, value)

		# We need parameters + buffers here, as state_dict does not count non-persistent buffers which are taking space
		expected_keys = [name for name, _ in model.named_parameters()] + [name for name, _ in model.named_buffers()]

		modules_sizes, _ = compute_module_sizes(model, hf_quantizer, only_modules=False)
		for param_name, device in accelerator_device_map.items():

Conversation

SunMarc commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Dec 11, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SunMarc Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez commented Dec 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SunMarc commented Dec 11, 2025 •

edited

Loading

SunMarc Dec 12, 2025 •

edited

Loading

SunMarc Dec 12, 2025 •

edited

Loading