[docs] outline sharded ddp doc by stas00 · Pull Request #9208 · huggingface/transformers

stas00 · 2020-12-19T05:34:13Z

This PR provides an initial outline of HF Trainer integration starting with ZeRO. We have fairscale's Sharded optimizer/gradients supported already and deepspeed is coming

We won't merge this until fairscale merged all the required fixes and released a new version, but I thought it'd be good to start the doc going so it's ready when fairscale is ready.

I hope to submit a deepspeed integration shortly as well, so we will extend it then with deepspeed info. edit (#9211)

@sgugger

sgugger

Thanks for drafting this!

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

blefaudeux · 2020-12-22T06:39:53Z

+One of the main benefits of enabling `--sharded_ddp` is that it uses a lot less GPU memory, so you should be able to
+use significantly larger batch sizes using the same hardware (e.g. 3x or bigger).
+
+Eventually more parts will be supported via integrating `DeepSpeed <https://github.com/microsoft/DeepSpeed>`__.


just to be clear, fairscale and deepspeed do not share code, so it's not a natural follow up (by no way against that, feel free of course, it's just that somebody reading this could understand it that way). Fairscale/OSS & shardedDDP are certainly based on the ZeRO paper ideas and the credit here is very valid to me, I'm not disputing that of course.

We can make it clearer that they are both different implementations of the ZeRO paper.

@blefaudeux, we want to make sure that:

both fairscale and deepspace get the full awesomeness factor loud and clear - so please do make any suggestions that you see fit - both of your projects are amazing!

the users have an easy way to understand when to use which and how they can evaluate pros and cons - so again any suggestions for clarification are very welcome.

The deepspeed integration PR #9211 is coming along nicely so we will expand and make this doc section even more clear and balanced - this was just the very basic entry point to send users to when they ask - what's sharded_ddp is about, where to read about it, what nuances to know about, etc.

outline sharded dpp doc

c6741ef

stas00 mentioned this pull request Dec 19, 2020

Sharded DDP training fails with seq2seq models #9156

Closed

4 tasks

stas00 added 2 commits December 18, 2020 21:36

fix link

6f40ede

add example

f19015e

sgugger approved these changes Dec 20, 2020

View reviewed changes

Comment thread docs/source/training.rst Outdated

Comment thread docs/source/training.rst Outdated

Comment thread docs/source/training.rst Outdated

Comment thread docs/source/training.rst Outdated

Comment thread docs/source/training.rst Outdated

Comment thread docs/source/training.rst Outdated

stas00 and others added 2 commits December 20, 2020 10:34

Apply suggestions from code review

6192882

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

narrow the command and remove non-essentials

96f2c26

stas00 mentioned this pull request Dec 22, 2020

[OSS] Balance the trainable params only facebookresearch/fairscale#262

Merged

4 tasks

blefaudeux reviewed Dec 22, 2020

View reviewed changes

This was referenced Dec 24, 2020

[trainer] deepspeed integration #9211

Merged

T5-base goes out of memory on 4 GPUs with as small batch size as 4 #9311

Closed

Model Parallelism and Big Models #8771

Open

stas00 merged commit d64372f into huggingface:master Jan 6, 2021

stas00 deleted the zero-docs branch January 6, 2021 01:34

Narsil mentioned this pull request Jan 13, 2021

Adding Megatron models. #9560

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] outline sharded ddp doc#9208

[docs] outline sharded ddp doc#9208
stas00 merged 5 commits intohuggingface:masterfrom
stas00:zero-docs

stas00 commented Dec 19, 2020 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blefaudeux Dec 22, 2020

Uh oh!

sgugger Dec 22, 2020

Uh oh!

stas00 Dec 22, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stas00 commented Dec 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blefaudeux Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

sgugger Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

stas00 Dec 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stas00 commented Dec 19, 2020 •

edited

Loading

stas00 Dec 22, 2020 •

edited

Loading