🥓 [docs] add CP docs#3994
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
|
||
| **Option 1: Using SFTConfig** | ||
|
|
||
| **With Wrapped Strategy:** |
There was a problem hiding this comment.
I recommend not mentionning "wrapped" in the documentation. I feel like this is a very advanced use, and it somewhat distracts from the message of this section.
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
i'll undo my change |
lewtun
left a comment
There was a problem hiding this comment.
Nice doc! Overall LGTM with @qgallouedec comments about avoiding the wrapped example
| from trl import SFTConfig | ||
|
|
||
| training_args = SFTConfig( | ||
| max_seq_length=2048, # Long sequence length |
There was a problem hiding this comment.
Maybe we can make this truly long like 16384 tokens?
|
|
||
| #### Accelerate Configuration | ||
|
|
||
| Create an accelerate config file (e.g. `context_parallel_config.yaml` for 2 GPUs): |
There was a problem hiding this comment.
I wonder if it makes sense to have a copy of this in accelerate_configs/fsdp2_cp.yaml so there's a standard reference people can work from? If you agree, I'd make it to run on 8 GPUs which is the default for our other configs
There was a problem hiding this comment.
ok yes good idea with 8 gpus
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
still needs huggingface/transformers#40619 |
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
|
@qgallouedec forgot to checkin the yaml... try now |
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
|
|
||
| #### Training Configuration | ||
|
|
||
| You can configure context parallelism training either programmatically or via command line: |
There was a problem hiding this comment.
just realized that this is not accurate, as you always have to pass the accelerate config. I'll refactor a bit
What does this PR do?
Add example and docs for CP SFT training.