[docs] optimizations quickstart#42538
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Very nice! Sorry for the delay, added some comments!
cb07400 to
dd98809
Compare
Cyrilvallez
left a comment
There was a problem hiding this comment.
Nice! cc @LysandreJik as well if you want to have a look/have more comments!
LysandreJik
left a comment
There was a problem hiding this comment.
Super good start! Love it. Let's get it in and iterate over it then
cc @SunMarc as well, related to some topics we discussed recently.
| > [!NOTE] | ||
| > Memory and speed are closely related but not the same. Shrinking your memory footprint makes a model "faster" because there is less data to move around. Pure speed optimizations don't always reduce memory and sometimes increase usage. Choose the appropriate optimization based on your use case and hardware. |
SunMarc
left a comment
There was a problem hiding this comment.
Thanks that's really nice ! Just left a minor comment
| [Expert parallelism](./expert_parallelism) distributes experts across devices for mixture-of-experts (MoE) models. Set `enable_expert_parallel` in [`DistributedConfig`] to enable it. | ||
|
|
||
| ```py | ||
| from transformers import AutoModelForCausalLM | ||
| from transformers.distributed.configuration_utils import DistributedConfig | ||
|
|
||
| distributed_config = DistributedConfig(enable_expert_parallel=True) | ||
| model = AutoModelForCausalLM.from_pretrained( | ||
| "openai/gpt-oss-120b", | ||
| distributed_config=distributed_config, | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
Not sure if we want to promote this API for now as it is not that used for now. Since we are refactoring moe and more, maybe it is a good time to fix how distributed_config will work (tp, ep, pp) ? cc @ArthurZucker @3outeille. Right now, the only model that is using this feature is llama4.
There was a problem hiding this comment.
sounds good, i can update docs for expert parallelism in #42409 once the API is more stable
dd98809 to
070fcb8
Compare
* quickstart * feedback * feedback
Adds an overview/quickstart of available Transformers optimization techniques to provide a clear and centralized place where they're all documented and help users select the right optimization to increase speed or reduce memory footprint