[doc] a possible gradient_clipping default fix and questions by stas00 · Pull Request #656 · deepspeedai/DeepSpeed

stas00 · 2021-01-10T20:09:31Z

This PR fixes gradient_clipping default to be 1.0 and not 0, since I see that in your code it defaults to 1.0.

But I'm not sure about several things. As it appears that in different places this value behaves/is used differently.

In several places you call:

                torch.nn.utils.clip_grad_norm_(parameters=master_params,
                                               max_norm=self.gradient_clipping())

and here, it's the common max_grad_norm - which should be 1.0 by default. And yes, you have:

deepspeed/runtime/constants.py:"gradient_clipping": 1.0

but then you tell not to use that value:

         if 'max_grad_norm' in optimizer_parameters.keys():
            raise ValueError(
                "'max_grad_norm' is not supported as an optimizer parameter, please switch to using the deepspeed parameter 'gradient_clipping' see: https://www.deepspeed.ai/docs/config-json/#gradient-clipping for more details"
            )

and the doc you send the user to provides no details whatsoever.

So in some parts of the code I see gradient_clipping used exactly as max_grad_norm would, yet, FP16_Optimizer uses clip_norm with the default of 0.0!!!

class FP16_Optimizer(object):
[...]
    def __init__(self,
[...]
                 clip_grad=0.0,

Yet, it gets initialized from the same:

        clip_grad = self.gradient_clipping()

whose default is 1.0 everywhere in your code.

deepspeed/runtime/constants.py:"gradient_clipping": 1.0

Also why is gradient_clipping a top level entry and not part of the optimizer config?

Beyond the whys, the main question is whether it is safe for us to init deepspeed with:

  "gradient_clipping": args.max_grad_norm

which defaults to 1.0 in our setup.

And this doc https://www.deepspeed.ai/docs/config-json/#gradient-clipping could definitely use some disambiguation and perhaps a few more lines of explanation of what's happening there.

Thanks

stas00 · 2021-01-12T18:27:20Z

Would it be possible to have a quick peek at this PR?

We are ready to merge the DeepSpeed integration: huggingface/transformers#9211

This is just one last bit that I need to validate with you before merging it. Thank you!

@jeffra, @tjruwase

stas00 · 2021-03-13T04:32:37Z

ping?

stas00 · 2021-03-18T02:29:46Z

So is it 1.0 or 0?

fix gradient_clipping default

bac0c45

stas00 requested review from RezaYazdaniAminabadi, ShadenSmith, arashashari, awan-10, cli99, conglongli, eltonzheng, jeffra, minjiaz, niumanar, samyam and tjruwase as code owners January 10, 2021 20:09

stas00 mentioned this pull request Jan 10, 2021

[trainer] deepspeed integration huggingface/transformers#9211

Merged

1 task

jeffra added 2 commits January 12, 2021 10:29

Merge branch 'master' into grad_clip

f67021a

Merge branch 'master' into grad_clip

0763f87

stas00 added 2 commits March 17, 2021 19:30

Merge remote-tracking branch 'origin/master' into grad_clip

772869e

Merge remote-tracking branch 'origin/master' into grad_clip

e34d4ee

tjruwase approved these changes Apr 14, 2021

View reviewed changes

Merge branch 'master' into grad_clip

9b31d6d

tjruwase merged commit b7f9706 into deepspeedai:master Apr 26, 2021

stas00 deleted the grad_clip branch April 26, 2021 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[doc] a possible gradient_clipping default fix and questions#656

[doc] a possible gradient_clipping default fix and questions#656
tjruwase merged 6 commits intodeepspeedai:masterfrom
stas00:grad_clip

stas00 commented Jan 10, 2021 •

edited

Loading

Uh oh!

stas00 commented Jan 12, 2021

Uh oh!

stas00 commented Mar 13, 2021

Uh oh!

stas00 commented Mar 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stas00 commented Jan 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Jan 12, 2021

Uh oh!

stas00 commented Mar 13, 2021

Uh oh!

stas00 commented Mar 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stas00 commented Jan 10, 2021 •

edited

Loading