[docs] optimizations quickstart by stevhliu · Pull Request #42538 · huggingface/transformers

stevhliu · 2025-12-01T23:21:28Z

Adds an overview/quickstart of available Transformers optimization techniques to provide a clear and centralized place where they're all documented and help users select the right optimization to increase speed or reduce memory footprint

HuggingFaceDocBuilderDev · 2025-12-01T23:30:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez

Very nice! Sorry for the delay, added some comments!

Cyrilvallez

Nice! cc @LysandreJik as well if you want to have a look/have more comments!

LysandreJik

Super good start! Love it. Let's get it in and iterate over it then

cc @SunMarc as well, related to some topics we discussed recently.

LysandreJik · 2025-12-12T08:42:32Z

+> [!NOTE]
+> Memory and speed are closely related but not the same. Shrinking your memory footprint makes a model "faster" because there is less data to move around. Pure speed optimizations don't always reduce memory and sometimes increase usage. Choose the appropriate optimization based on your use case and hardware.


SunMarc

Thanks that's really nice ! Just left a minor comment

SunMarc · 2025-12-12T11:07:12Z

+[Expert parallelism](./expert_parallelism) distributes experts across devices for mixture-of-experts (MoE) models. Set `enable_expert_parallel` in [`DistributedConfig`] to enable it.
+
+```py
+from transformers import AutoModelForCausalLM
+from transformers.distributed.configuration_utils import DistributedConfig
+
+distributed_config = DistributedConfig(enable_expert_parallel=True)
+model = AutoModelForCausalLM.from_pretrained(
+    "openai/gpt-oss-120b",
+    distributed_config=distributed_config,
+)
+```


Not sure if we want to promote this API for now as it is not that used for now. Since we are refactoring moe and more, maybe it is a good time to fix how distributed_config will work (tp, ep, pp) ? cc @ArthurZucker @3outeille. Right now, the only model that is using this feature is llama4.

sounds good, i can update docs for expert parallelism in #42409 once the API is more stable

* quickstart * feedback * feedback

stevhliu requested review from Cyrilvallez and LysandreJik December 1, 2025 23:30

Cyrilvallez reviewed Dec 8, 2025

View reviewed changes

Comment thread docs/source/en/optimization_overview.md

Comment thread docs/source/en/optimization_overview.md Outdated

Comment thread docs/source/en/optimization_overview.md Outdated

stevhliu force-pushed the opt-overview branch from cb07400 to dd98809 Compare December 9, 2025 22:52

Cyrilvallez approved these changes Dec 11, 2025

View reviewed changes

stevhliu mentioned this pull request Dec 12, 2025

[docs] optimization cleanup #42827

Merged

LysandreJik approved these changes Dec 12, 2025

View reviewed changes

SunMarc approved these changes Dec 12, 2025

View reviewed changes

stevhliu added 3 commits December 15, 2025 13:53

quickstart

7662695

feedback

3cc148a

feedback

070fcb8

stevhliu force-pushed the opt-overview branch from dd98809 to 070fcb8 Compare December 15, 2025 22:16

stevhliu merged commit 31de95e into huggingface:main Dec 15, 2025
15 checks passed

stevhliu deleted the opt-overview branch December 15, 2025 22:25

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026

[docs] optimizations quickstart (huggingface#42538)

a50c82e

* quickstart * feedback * feedback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] optimizations quickstart#42538

[docs] optimizations quickstart#42538
stevhliu merged 3 commits intohuggingface:mainfrom
stevhliu:opt-overview

stevhliu commented Dec 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 1, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cyrilvallez left a comment

Uh oh!

LysandreJik left a comment

Uh oh!

LysandreJik Dec 12, 2025

Uh oh!

SunMarc left a comment

Uh oh!

SunMarc Dec 12, 2025

Uh oh!

stevhliu Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		> [!NOTE]
		> Memory and speed are closely related but not the same. Shrinking your memory footprint makes a model "faster" because there is less data to move around. Pure speed optimizations don't always reduce memory and sometimes increase usage. Choose the appropriate optimization based on your use case and hardware.

Conversation

stevhliu commented Dec 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 1, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

LysandreJik Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

stevhliu Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants