Skip to content

removed restrictions for custom optimizer#161

Merged
ShadenSmith merged 3 commits intodeepspeedai:masterfrom
CalogeroZarbo:custom_optimizer
Mar 22, 2020
Merged

removed restrictions for custom optimizer#161
ShadenSmith merged 3 commits intodeepspeedai:masterfrom
CalogeroZarbo:custom_optimizer

Conversation

@CalogeroZarbo
Copy link
Copy Markdown
Contributor

Removed the restriction to be able to use any optimizer of choice. A warning message has been added.

@ShadenSmith ShadenSmith requested a review from tjruwase March 22, 2020 17:54
@tjruwase
Copy link
Copy Markdown
Contributor

Looks good

@ShadenSmith ShadenSmith merged commit ac9cc7f into deepspeedai:master Mar 22, 2020
@ShadenSmith
Copy link
Copy Markdown
Contributor

Thanks for your contribution to DeepSpeed!

@CalogeroZarbo
Copy link
Copy Markdown
Contributor Author

Thank you too for this amazing lib!

@CalogeroZarbo CalogeroZarbo deleted the custom_optimizer branch March 23, 2020 08:25
@ShadenSmith ShadenSmith linked an issue Mar 25, 2020 that may be closed by this pull request
@ShadenSmith
Copy link
Copy Markdown
Contributor

I was thinking this over some more and I'm worried that the lone warning is easy to miss in a typically-long logfile. That could lead to some "silent" divergence errors.

What if we keep the warning, but require an additional config in the JSON such as "zero_allow_untested_optimizer" : true in order to "unlock" the untested optimizer? If the flag is not set, we raise an error and provide instructions and the appropriate warnings.

@CalogeroZarbo
Copy link
Copy Markdown
Contributor Author

Yep @ShadenSmith that seems to me as a very good approach to avoid problems difficult do debug in a complex training system. I'll fork again and make another pull-request if that's ok for you. Let me know.

Cheers,
Cal

@tjruwase
Copy link
Copy Markdown
Contributor

@CalogeroZarbo this sounds good. Thanks!

@ShadenSmith
Copy link
Copy Markdown
Contributor

Perfect, thanks @CalogeroZarbo !

kouml pushed a commit to kouml/DeepSpeed that referenced this pull request Apr 3, 2020
jeffra pushed a commit to jeffra/DeepSpeed that referenced this pull request Aug 25, 2021
* Add optimizer swapping

* Swap fp16 params to nvme

* Formatting

* Address review feedback

* License file
@FWkey
Copy link
Copy Markdown

FWkey commented Nov 1, 2023

what are the possible errors if i use custom optimizer? or how to ensure its correctness, should i do some tests?

@tjruwase
Copy link
Copy Markdown
Contributor

tjruwase commented Nov 1, 2023

@FWkey, this is a tricky question. You could compare training loss for a smaller model of pytorch DDP vs ZeRO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ZeRO & Custom Optmizer (RangerLars)

4 participants