Fix: avoid late CUDA OOM in load_best_model_at_end with PEFT models by DogWala · Pull Request #44660 · huggingface/transformers

DogWala · 2026-03-13T12:59:25Z

What does this PR do?

This PR makes the PEFT load_best_model_at_end path in Trainer use a CPU-first adapter reload path during best-model loading.

Previously, when training a PEFT model, Trainer could reload the best adapter through a path that materialized adapter weights on CUDA during the final best-model load. Under low remaining GPU memory, this could trigger a late OOM even though the training loop had already completed.

To be specific:

The OOM happens because PeftModel.load_adapter() does not load weights directly into the existing adapter parameters in place. Instead, it first calls load_peft_weights(), which materializes a full temporary adapter state_dict on the target device, and only then passes that state_dict into set_peft_model_state_dict() / model.load_state_dict(...) to copy the values into the actual model parameters.

When torch_device is not specified, the current PEFT path infers cuda, so the checkpoint tensors are first loaded as a separate set of CUDA tensors. Under low remaining GPU memory, this extra device-side materialization can OOM before the weights are fully copied into the model, even though training itself has already finished.

A CPU-first load path is more memory-safe here: load the adapter checkpoint onto CPU first, then copy the weights into the model parameters. That avoids creating a full temporary CUDA state_dict at the most memory-constrained point of load_best_model_at_end.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@SunMarc @BenjaminBossan

BenjaminBossan · 2026-03-13T13:18:52Z

Comment to reviewers: Let's resolve the discussion in #44637 first before proceeding with this PR.

DogWala added 2 commits March 13, 2026 20:47

Use CPU-first adapter reload for PEFT load_best_model_at_end

8ddc875

Style: format PEFT best-model reload call

c49dbf6

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: avoid late CUDA OOM in load_best_model_at_end with PEFT models#44660

Fix: avoid late CUDA OOM in load_best_model_at_end with PEFT models#44660
DogWala wants to merge 2 commits intohuggingface:mainfrom
DogWala:fix-peft-best-load-cpu

DogWala commented Mar 13, 2026

Uh oh!

BenjaminBossan commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DogWala commented Mar 13, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

BenjaminBossan commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants