[docs] Update shard size by stevhliu · Pull Request #42749 · huggingface/transformers

stevhliu · 2025-12-09T19:20:55Z

Updates docs to reflect increased shard size in #42734

HuggingFaceDocBuilderDev · 2025-12-09T19:29:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez

Nice, thanks for being quick on this!! It's indeed a very important aspect!

Cyrilvallez · 2025-12-11T14:35:18Z

+[`~PreTrainedModel.save_pretrained`] automatically shards checkpoints larger than 50GB. This keeps shard counts low for large models and simplifies file management without significantly slowing load times.

-Each shard is loaded sequentially after the previous shard is loaded, limiting memory usage to only the model size and the largest shard size.
+Shards load sequentially and memory usage is limited to the model size plus the largest shard. Set `max_shard_size` in [`~PreTrainedModel.save_pretrained`] to control the threshold.


This is not true anymore!
Parameters are loaded in parallel by default now, (can be deactivated by setting HF_DEACTIVATE_ASYNC_LOAD env variable), and the memory usage is strictly constrained to the model size, EXCEPT if the model needs on-the-fly weight conversions (i.e. most MoE models), in which case the memory peak is model_size + largest_params_needed_in_a_single_conversion, i.e. for the MoE models, model_size + experts_on_one_layer

Ah thanks, I forgot about this! I think it'd also be nice to link to #42636, which touches on this a bit at the end, for more details. wdyt?

Definitely!

Cyrilvallez

Nice! Thanks a lot! Just left a final comment

Cyrilvallez · 2025-12-15T09:31:29Z

+[`~PreTrainedModel.save_pretrained`] automatically shards checkpoints larger than 50GB. This keeps shard counts low for large models and simplifies file management.

-Each shard is loaded sequentially after the previous shard is loaded, limiting memory usage to only the model size and the largest shard size.
+Parameters load in parallel and peak memory only depends on model size. Set `max_shard_size` in [`~PreTrainedModel.save_pretrained`] to control the threshold.


What threshold are we talking about here?

threshold refers to the maximum checkpoint size before sharding. i updated it so its more clear what "threshold" is :)

* shard size * feedback * add link * clarify * update link

stevhliu requested a review from Cyrilvallez December 9, 2025 19:35

Cyrilvallez reviewed Dec 11, 2025

View reviewed changes

Cyrilvallez mentioned this pull request Dec 12, 2025

[docs] WeightConverter #42636

Merged

stevhliu force-pushed the shard-size branch from 6dce382 to f6366b9 Compare December 12, 2025 19:41

Cyrilvallez approved these changes Dec 15, 2025

View reviewed changes

stevhliu added 5 commits December 15, 2025 10:12

shard size

119dcba

feedback

f074c9e

add link

19e94b1

clarify

6fdd205

update link

9a86b81

stevhliu force-pushed the shard-size branch from 680f30d to 9a86b81 Compare December 15, 2025 18:12

stevhliu merged commit 12fe95f into huggingface:main Dec 17, 2025
15 checks passed

stevhliu deleted the shard-size branch December 17, 2025 18:46

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026

[docs] Update shard size (huggingface#42749)

12b37b9

* shard size * feedback * add link * clarify * update link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Update shard size#42749

[docs] Update shard size#42749
stevhliu merged 5 commits intohuggingface:mainfrom
stevhliu:shard-size

stevhliu commented Dec 9, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 9, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Uh oh!

Cyrilvallez Dec 11, 2025

Uh oh!

stevhliu Dec 11, 2025

Uh oh!

Cyrilvallez Dec 12, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Cyrilvallez Dec 15, 2025

Uh oh!

stevhliu Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stevhliu commented Dec 9, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 9, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cyrilvallez Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

stevhliu Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

stevhliu Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants