[doc] a possible gradient_clipping default fix and questions#656
Merged
tjruwase merged 6 commits intodeepspeedai:masterfrom Apr 26, 2021
Merged
[doc] a possible gradient_clipping default fix and questions#656tjruwase merged 6 commits intodeepspeedai:masterfrom
tjruwase merged 6 commits intodeepspeedai:masterfrom
Conversation
1 task
Collaborator
Author
|
Would it be possible to have a quick peek at this PR? We are ready to merge the DeepSpeed integration: huggingface/transformers#9211 This is just one last bit that I need to validate with you before merging it. Thank you! |
Collaborator
Author
|
ping? |
Collaborator
Author
|
So is it 1.0 or 0? |
tjruwase
approved these changes
Apr 14, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes
gradient_clippingdefault to be1.0and not0, since I see that in your code it defaults to1.0.But I'm not sure about several things. As it appears that in different places this value behaves/is used differently.
In several places you call:
and here, it's the common
max_grad_norm- which should be1.0by default. And yes, you have:but then you tell not to use that value:
and the doc you send the user to provides no details whatsoever.
So in some parts of the code I see
gradient_clippingused exactly asmax_grad_normwould, yet,FP16_Optimizerusesclip_normwith the default of0.0!!!Yet, it gets initialized from the same:
whose default is 1.0 everywhere in your code.
Also why is
gradient_clippinga top level entry and not part of the optimizer config?Beyond the whys, the main question is whether it is safe for us to init deepspeed with:
which defaults to
1.0in our setup.And this doc https://www.deepspeed.ai/docs/config-json/#gradient-clipping could definitely use some disambiguation and perhaps a few more lines of explanation of what's happening there.
Thanks