Skip to content

[docs] Update shard size#42749

Merged
stevhliu merged 5 commits intohuggingface:mainfrom
stevhliu:shard-size
Dec 17, 2025
Merged

[docs] Update shard size#42749
stevhliu merged 5 commits intohuggingface:mainfrom
stevhliu:shard-size

Conversation

@stevhliu
Copy link
Copy Markdown
Member

@stevhliu stevhliu commented Dec 9, 2025

Updates docs to reflect increased shard size in #42734

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@stevhliu stevhliu requested a review from Cyrilvallez December 9, 2025 19:35
Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for being quick on this!! It's indeed a very important aspect!

Comment thread docs/source/en/models.md Outdated
Comment thread docs/source/en/models.md Outdated
[`~PreTrainedModel.save_pretrained`] automatically shards checkpoints larger than 50GB. This keeps shard counts low for large models and simplifies file management without significantly slowing load times.

Each shard is loaded sequentially after the previous shard is loaded, limiting memory usage to only the model size and the largest shard size.
Shards load sequentially and memory usage is limited to the model size plus the largest shard. Set `max_shard_size` in [`~PreTrainedModel.save_pretrained`] to control the threshold.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true anymore!
Parameters are loaded in parallel by default now, (can be deactivated by setting HF_DEACTIVATE_ASYNC_LOAD env variable), and the memory usage is strictly constrained to the model size, EXCEPT if the model needs on-the-fly weight conversions (i.e. most MoE models), in which case the memory peak is model_size + largest_params_needed_in_a_single_conversion, i.e. for the MoE models, model_size + experts_on_one_layer

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, I forgot about this! I think it'd also be nice to link to #42636, which touches on this a bit at the end, for more details. wdyt?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely!

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks a lot! Just left a final comment

Comment thread docs/source/en/models.md Outdated
[`~PreTrainedModel.save_pretrained`] automatically shards checkpoints larger than 50GB. This keeps shard counts low for large models and simplifies file management.

Each shard is loaded sequentially after the previous shard is loaded, limiting memory usage to only the model size and the largest shard size.
Parameters load in parallel and peak memory only depends on model size. Set `max_shard_size` in [`~PreTrainedModel.save_pretrained`] to control the threshold.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What threshold are we talking about here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threshold refers to the maximum checkpoint size before sharding. i updated it so its more clear what "threshold" is :)

@stevhliu stevhliu merged commit 12fe95f into huggingface:main Dec 17, 2025
15 checks passed
@stevhliu stevhliu deleted the shard-size branch December 17, 2025 18:46
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* shard size

* feedback

* add link

* clarify

* update link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants