Fix FSDP_CPU_RAM_EFFICIENT_LOADING (#43749) by MengAiDev · Pull Request #43785 · huggingface/transformers

MengAiDev · 2026-02-06T00:46:08Z

Add _is_hf_initialized flag in _load_parameter_into_model to prevent unnecessary random initialization
Skip state_dict loading for non-rank0 processes when FSDP is enabled to avoid wasting CPU RAM
This fixes the issue where all ranks temporarily allocate model-sized CPU RAM and experience long delays

Modified:

src/transformers/modeling_utils.py: Set _is_hf_initialized=True on new parameters
src/transformers/core_model_loading.py: Add FSDP check to skip loading on non-rank0

- Add _is_hf_initialized flag in _load_parameter_into_model to prevent unnecessary random initialization - Skip state_dict loading for non-rank0 processes when FSDP is enabled to avoid wasting CPU RAM - This fixes the issue where all ranks temporarily allocate model-sized CPU RAM and experience long delays Modified: - src/transformers/modeling_utils.py: Set _is_hf_initialized=True on new parameters - src/transformers/core_model_loading.py: Add FSDP check to skip loading on non-rank0

ArthurZucker · 2026-02-06T11:05:45Z

+    from .integrations import is_fsdp_enabled
+    from .modeling_utils import is_local_dist_rank_0
+
+    if is_fsdp_enabled() and not is_local_dist_rank_0() and hf_quantizer is None:
+        state_dict = []
+


I don't think this is where the skip should happen, you r are creating a thread pool for notthing

ArthurZucker · 2026-02-06T11:12:12Z

@@ -476,6 +476,10 @@ def _load_parameter_into_model(model: "PreTrainedModel", param_name: str, tensor
    parent, param_type = get_module_from_name(model, param_name)
    if param_type in parent._parameters and not isinstance(tensor, nn.Parameter):


this is only used in _move_missing_keys_from_meta_to_device for missing keys, it does not really make sense for me can you elaborate on why?

winglian · 2026-02-06T18:11:11Z

I don't think this behavior should happen at this low of a level. There are a whole host of other cases where this would break ND-parallel loading (HSDP for example), where you would have FSDP, but have multiple data parallel meshes that each would need the model weights too.

@ArthurZucker

- Move FSDP check to function entry point to avoid creating empty thread pool - Add detailed documentation for _is_hf_initialized flag usage - Add should_skip_non_rank0_weight_loading() helper to support HYBRID_SHARD strategies - Ensure compatibility with ND-parallel scenarios like HSDP This addresses review feedback: - @ArthurZucker: Fix thread pool creation issue and explain _is_hf_initialized purpose - @winglian: Ensure compatibility with HYBRID_SHARD and HYBRID_SHARD_ZERO2 strategies

Cyrilvallez · 2026-02-11T11:17:53Z

Agreed, everything should simply be skipped both in _initialize_missing_keys

MengAiDev added 2 commits February 6, 2026 08:36

add judge is_fsdp_enabled

88fe5c6

ArthurZucker reviewed Feb 6, 2026

View reviewed changes

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix FSDP_CPU_RAM_EFFICIENT_LOADING (#43749)#43785

Fix FSDP_CPU_RAM_EFFICIENT_LOADING (#43749)#43785
MengAiDev wants to merge 3 commits intohuggingface:mainfrom
MengAiDev:fix/fsdp-cpu-ram-efficient-loading

MengAiDev commented Feb 6, 2026

Uh oh!

ArthurZucker Feb 6, 2026

Uh oh!

ArthurZucker Feb 6, 2026

Uh oh!

winglian commented Feb 6, 2026

Uh oh!

Cyrilvallez commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -476,6 +476,10 @@ def _load_parameter_into_model(model: "PreTrainedModel", param_name: str, tensor
		parent, param_type = get_module_from_name(model, param_name)
		if param_type in parent._parameters and not isinstance(tensor, nn.Parameter):

Conversation

MengAiDev commented Feb 6, 2026

Uh oh!

ArthurZucker Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

winglian commented Feb 6, 2026

Uh oh!

Cyrilvallez commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants