[loading] Really initialize on meta device for huge perf gains by Cyrilvallez · Pull Request #42941 · huggingface/transformers

Cyrilvallez · 2025-12-18T11:17:42Z

What does this PR do?

Follow-up to #42309 to really leverage meta device loading. Gives crazy speedups for loading some models, e.g. about 2.5x on gpt-oss 20b and about 3x on the 120b version

The issue at hand

Currently, during loading we initialize the model on meta device before loading weights, thanks to init_empty_weights from accelerate.
However, this context manager has BIG drawbacks:

everything (every parameter and buffer) is first materialized on cpu before being moved to meta device
this is extremely inefficient of course, as we only want them on meta -> it wastes time and memory
all buffers stay on cpu (even the persistent ones, that we are loading again after anyway, so don't need to be there...)

For some models, e.g. gpt-oss, we have the following during loading:

Note how most of the loading time is BEFORE the actual loading of the weights (_load_pretrained call), just to initialize parameters that should be on meta device anyway....

What this PR is doing

This PR completely removes init_empty_weights in favor of torch.device("meta") to really start with a model on meta device, without first putting them on cpu.
We are free to do so since I've merged #42309 yesterday, to correctly handle re-initialization of the non-persistent buffers which are put on meta device as well.

The performance gains are immense, the same benchmark as before for gpt-oss shows it:

and we can now see that most of the time in from_pretrained is used for the actual weight loading (_load_pretrained call), as it should be.

Raw numbers for the following simple benchmark script (from which the above traces are from) on our cluster:

from transformers import AutoModelForCausalLM
import torch
import time
from viztracer import VizTracer

model_id = "openai/gpt-oss-20b"
device = 0

tracer = VizTracer()
tracer.start()

torch.cuda.synchronize()
t0 = time.time()
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, dtype=torch.float16)
torch.cuda.synchronize()
dt = time.time() - t0
print(f"Took {dt:.2f} s")

tracer.stop()
tracer.save(output_file="../trace.json")

are the following:

BEFORE THIS PR -> ~9.1s
ON THIS PR -> ~3.7s

Which means a speedup of about 2.5x.

For the 120B gpt-oss version, we have:

BEFORE THIS PR -> ~32.4s
ON THIS PR -> ~10.7s

or a speedup of about 3x

HuggingFaceDocBuilderDev · 2025-12-18T11:26:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez · 2025-12-19T11:44:55Z

-        with init_empty_weights():
+        with torch.device("meta"):


cc @SunMarc for this change, do you know if this switch on all the quantizers's replace_with_xxx is fine? Basically, the only difference is if the quantized layer registers some buffers, they would now be on meta as well. I checked for bnb and there it seems to be alright at least (no buffers)

as long as it's not a non persistant buffer, it should be fine !

Could be related minor issue:

AutoModel.from_pretrained(...) (with explicit device_map unset) fails under with torch.device("meta") with PyTorch 2.6.0 and 2.7.0 #38066

ArthurZucker

An early Christmas gift for everyone

ArthurZucker · 2025-12-19T12:32:37Z

-if is_accelerate_available():
-    from accelerate import init_empty_weights


github-actions · 2025-12-19T13:29:35Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: chameleon, codegen, ctrl, deepseek_vl, deepseek_vl_hybrid, emu3, eomt, fastspeech2_conformer, gemma, gemma2, gptj, idefics2, idefics3, janus, layoutlmv3, llava_next

…ngface#42941) * use meta device directly * style * move back non-persistent * fix * make helper * fix it * use native param dtype * make tensors buffers * style * fix * oupsi * add a test and fix * fix * create timm integration to reinit non-persistemnt buffers.... * style * style * more * better * add doc * more timm stuff * more * fix * small change * no actually it was fine before

Cyrilvallez added 8 commits December 18, 2025 16:13

use meta device directly

c35e7da

style

a0d280b

move back non-persistent

0043970

fix

ffef6b9

make helper

9ba18cb

fix it

cf9feb8

use native param dtype

587512f

make tensors buffers

d387bd3

Cyrilvallez force-pushed the init-meta branch from 2e4d93e to d387bd3 Compare December 18, 2025 15:15

Cyrilvallez added 13 commits December 18, 2025 16:17

style

3766bdb

fix

8d362cd

oupsi

71cbe6a

add a test and fix

567840a

fix

0482f7e

create timm integration to reinit non-persistemnt buffers....

e93debb

style

56a4e2d

style

76554d4

more

be9ad47

better

15b18b8

add doc

06f8f19

more timm stuff

09db424

more

3d49151

Cyrilvallez changed the title ~~Init meta~~ [loading] Really initialize on meta device for huge perf gains Dec 19, 2025

fix

bf46f97

Cyrilvallez commented Dec 19, 2025

View reviewed changes

ArthurZucker approved these changes Dec 19, 2025

View reviewed changes

small change

23a348d

no actually it was fine before

48a1241

Cyrilvallez merged commit bb9357f into main Dec 19, 2025
26 checks passed

Cyrilvallez deleted the init-meta branch December 19, 2025 13:43

This was referenced Jan 6, 2026

Use timm-side buffers initialization #43124

Merged

Fix buffer offloading #43131

Merged

Cyrilvallez mentioned this pull request Feb 2, 2026

Transformers 5.0.0 fills non-persistent buffers with junk #43644

Closed

4 tasks

Cyrilvallez mentioned this pull request Mar 9, 2026

Transformers v5 fills non-persistent buffers with junk #44534

Closed

4 tasks

moehanabi mentioned this pull request Apr 22, 2026

fix: Bump sglang version from 0.5.9 to 0.5.10 sgl-project/SpecForge#529

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[loading] Really initialize on meta device for huge perf gains#42941

[loading] Really initialize on meta device for huge perf gains#42941
Cyrilvallez merged 24 commits intomainfrom
init-meta

Cyrilvallez commented Dec 18, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Dec 18, 2025

Uh oh!

Cyrilvallez Dec 19, 2025

Uh oh!

SunMarc Dec 19, 2025

Uh oh!

vadimkantorov Feb 3, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Dec 19, 2025

Uh oh!

github-actions Bot commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		if is_accelerate_available():
		from accelerate import init_empty_weights

Conversation

Cyrilvallez commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

The issue at hand

What this PR is doing

Uh oh!

HuggingFaceDocBuilderDev commented Dec 18, 2025

Uh oh!

Cyrilvallez Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

vadimkantorov Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Cyrilvallez commented Dec 18, 2025 •

edited

Loading