Allow args to be optional in deepspeed.initialize by jeffra · Pull Request #825 · deepspeedai/DeepSpeed

jeffra · 2021-03-05T19:45:05Z

Before this PR args passed into deepspeed.initialize must be an object with [get|set]attribute support. This PR allows users to initialize deepspeed via deepspeed.initialize(model=model, config_params=config_dict). This assumes that local_rank is set as an environment variable (which is already done via deepspeed/pytorch.distributed launchers and/or from deepspeed.init_distributed).

Can only do the above call with kwargs, positional args need to be backwards compatible to support deepspeed.initialize(args, model). NOTE: this change does not require changes to existing code that use deepspeed.initialize with args.

This PR allows hugging face and lightning to not have to pass a SimpleNamespace object to deepspeed.initialize which was previously a hack to get deepspeed working.

For hugging face integration this means they can (optionally) remove the SimpleNamespace object passed to local rank, since it will be set since deepspeed.init_distributed is already called prior to deepspeed.initialize which ensures the local rank is set properly. https://github.com/huggingface/transformers/blob/256482ac9285c467fb97ca3b1b693a4de1d0ac60/src/transformers/integrations.py#L409-L414 /cc @stas00

For lightning integration I don't believe it is using deepspeed.init_distributed for launching but if they assume torch.distributed launcher then this is fine, see: https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py#L253. They can also remove SimpleNamespace here as well then. https://github.com/PyTorchLightning/pytorch-lightning/blob/efda48faab666b78bb5f71bff4a0838f9b82dee5/pytorch_lightning/plugins/training_type/deepspeed.py#L243 /cc @SeanNaren

In either case hf/lightning could set or assert os.environ['LOCAL_RANK'] = self.local_rank before calling deepspeed.initialize.

stas00 · 2021-03-05T21:57:56Z

That is indeed much simpler. Thank you, @jeffra!

We will have to wait for your new release to implement that so that we can require that version number.

But probably need to wait for @cli99 to sort out the issue w/ non-DeepSpeed optim/scheduler before the new release if it's not too far.

SeanNaren · 2021-03-06T14:41:48Z

Nice changes @jeffra! Regarding lightning not using deepspeed.init_distributed, I think because this uses auto_mpi discovery as default, which required mpi4py to be installed, hence we just do a torch.distributed init process group ourselves. I may be incorrect now, will check :)

jeffra · 2021-03-16T16:31:04Z

Nice changes @jeffra! Regarding lightning not using deepspeed.init_distributed, I think because this uses auto_mpi discovery as default, which required mpi4py to be installed, hence we just do a torch.distributed init process group ourselves. I may be incorrect now, will check :)

Oh I see, interesting. deepspeed.init_distributed will only use the auto_mpi discovery code path (which requires mpi4py) if the training process(es) were launched without the deepspeed launcher or the torch distributed launcher.

@awan-10

* [WarmupDecayLR] fix log(0) & 1/log(1) bugs (deepspeedai#772) * fix log(0) & 1/log(1) bugs * simplify Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> * bump to v0.3.12 * Bug fix: Remove client optimizer param_group list item that does not have 'params' (deepspeedai#827) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [doc] pipeline doc typos/improvements (deepspeedai#659) Admin merging for pure-doc PR that does not trigger build. * Samyamr/inference hook fix (deepspeedai#851) * Fix mis-aligned-grad When a parameter is not divisible by world size, the partitioned gradients are mis-aligned due to incorrect padding handling. This PR should fix for that. * Formatting fix * Adding static_scale test back for Z3, and also changing hidden size to be not divisile by world_size * also removing alignment from flat fp16 buffers * Testing for hidden dim alignment * inference hook fix * Update stage3.py * formatting * [bug-fix] move params to gpu if offload params is turned off Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * ZeRO Stage 2: Clear reduced gradients (deepspeedai#856) * Ensure gradients of other partitions are cleared after reduction * Remove redundant code Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [runner/launch] propagate the error (deepspeedai#854) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * docs: minor spelling tweaks (deepspeedai#858) * Allow args to be optional in deepspeed.initialize (deepspeedai#825) * Fix ZeRO3 save_checkpoint (deepspeedai#857) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Make config objects json serializable (deepspeedai#862) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * bump version 0.3.13 * 1-bit Adam v2 (deepspeedai#817) Authors: @awan-10 @conglongli @samyam @jeffra What's new: NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation. Add support to momentum masks for those parameters with constant zero gradients during training. Bug fixes (e.g., deepspeedai#813). * NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (deepspeedai#594) * NCCL based 1-bit Implementation + Refactor to add communication backends (deepspeedai#593) * add nccl 1-bit optim. * temporary commit to save stuff. * Use dist collectives instead of mpi routines. * remove old code for comm. * Fix bugs. still does not work. * modify to test the nccl side code path * Initial gather impl. Works intra-node. * Updates to comm. phase 2. nccl comm. passed the tests. * refactor code to introduce nccl/mpi as backends for onebit adam. * Refactor updates to test/engine. * Fix compile/runtime errors. * simplify support for nccl/mpi backends. * Add missign file * Add compression backend in constructor. Revert later. * modify test with some perf counting. * Implement a true non-blocking gather for nccl side. * Revert "Add compression backend in constructor. Revert later." This reverts commit df8c40d. * improve the 1-bit adam test. * Refactor comm. and compression backend in 1-bit adam. * Fix the test. * Fix runtime errors and typos in nccl backend * fix mpi backend. modify tests. * modify nccl perf test. * fix mpi side errors. * Add an mpi perf test * Sync DSE. * Remove old collectives file. * Undo a typo. * Graceful failure for torch versions that don't support nccl pt2pt. * Revert "Merge branch 'master' into staging-1bit-nccl-v2" This reverts commit 7840085, reversing changes made to a6dba72. * Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2"" This reverts commit 6dbdd98. * comm optimization + 1-bit lamb * Saving/debugging commit. * finalizing 1-bit lamb * finalizing 1-bit lamb * add momentum mask and chkpt handling for 1-bit adam * Cleanup and modify nccl test to be runnable with deepspeed launcher. * Fix format. * fix formatting again. * make test runnable without mpi4py * Add dist.alltoall and dist.allgather instead of custom functions. * remove debug prints. * formatting and renaming * renaming * renaming * add unit test, fix existing tests * skip unit test when torch < 1.8 * revert 1-bit lamb * flatten momentum when dimension is more than 1 * add warning message for 1-bit adam under fp32 * improve version check * add fp32 test * 1-bit adam doc * fix file name * doc fix * torch 1.8 is released * doc fix * fix tests * update news * add doc for momentum mask * fix checkpoing handling, add unit test * checkpoint handling doc * doc final cleanup * bump dates * update tests * url change * doc fix * fix test * doc update Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * consistent checkpoint filenaming (deepspeedai#865) * consistent checkpoint filenaming * backward compatible rename Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * [doc] launcher (deepspeedai#868) As discussed in deepspeedai#662 this PR modifies the doc: * explains what to use instead of CUDA_VISIBLE_DEVICES * puts the `--hostfile` cl arg in the correct place in the invocation script Fixes: deepspeedai#662 Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [doc] pipeline (deepspeedai#888) * [doc] pipeline As @g-karthik flagged in deepspeedai#659 (comment) my previous correction PR had one sentence that said the wrong thing. So this PR attempts to rectify that. Thank you! * tweak * [debug utils] see_memory_usage fixes (deepspeedai#890) * see_memory_usage fixes * didn't expect pt-1.2 * fix the order of things * fix the order of things * full fp32 weights reconstruction for zero 2+3 (deepspeedai#892) * save_fp16_model consolidated for zero3 (deepspeedai#893) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training (deepspeedai#861) * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in deepspeedai#707 As some model trainable parameters skipped in training, their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run, so they have no norm_for_param_grads * Trim space * Trim space Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * update kramdown (deepspeedai#901) security alert related to older kramdown version * update backward api doc (deepspeedai#903) * Bump kramdown from 2.3.0 to 2.3.1 in /docs (deepspeedai#905) Bumps [kramdown](https://github.com/gettalong/kramdown) from 2.3.0 to 2.3.1. - [Release notes](https://github.com/gettalong/kramdown/releases) - [Changelog](https://github.com/gettalong/kramdown/blob/master/doc/news.page) - [Commits](https://github.com/gettalong/kramdown/commits) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * We're hiring! + integration posts * [website] We're hiring! + integration posts * [website] we're hiring! * zero.Init() clarification (deepspeedai#880) * zero.Init() clarification clarify that if `model.half()` can't fit into gpu memory `zero.Init()` is a must. this proposal is via @samyam's clarification shared elsewhere. Thank you. * style * add clarity * style Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * disable pipe test (deepspeedai#915) This test has been giving us trouble for a bit, seeing nondeterministic failures, skipping for now to not break out CI. Need to revisit soon though. * Add link to AML examples. (deepspeedai#916) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: brett koonce <koonce@gmail.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: hamlet <gvvvv@163.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: sid <sidney.black@aleph-alpha.de>

@awan-10

* test sparse self_attn fix * [WarmupDecayLR] fix log(0) & 1/log(1) bugs (deepspeedai#772) * fix log(0) & 1/log(1) bugs * simplify Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> * bump to v0.3.12 * Bug fix: Remove client optimizer param_group list item that does not have 'params' (deepspeedai#827) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [doc] pipeline doc typos/improvements (deepspeedai#659) Admin merging for pure-doc PR that does not trigger build. * Samyamr/inference hook fix (deepspeedai#851) * Fix mis-aligned-grad When a parameter is not divisible by world size, the partitioned gradients are mis-aligned due to incorrect padding handling. This PR should fix for that. * Formatting fix * Adding static_scale test back for Z3, and also changing hidden size to be not divisile by world_size * also removing alignment from flat fp16 buffers * Testing for hidden dim alignment * inference hook fix * Update stage3.py * formatting * [bug-fix] move params to gpu if offload params is turned off Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * ZeRO Stage 2: Clear reduced gradients (deepspeedai#856) * Ensure gradients of other partitions are cleared after reduction * Remove redundant code Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [runner/launch] propagate the error (deepspeedai#854) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * docs: minor spelling tweaks (deepspeedai#858) * Allow args to be optional in deepspeed.initialize (deepspeedai#825) * Fix ZeRO3 save_checkpoint (deepspeedai#857) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Make config objects json serializable (deepspeedai#862) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * bump version 0.3.13 * 1-bit Adam v2 (deepspeedai#817) Authors: @awan-10 @conglongli @samyam @jeffra What's new: NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation. Add support to momentum masks for those parameters with constant zero gradients during training. Bug fixes (e.g., deepspeedai#813). * NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (deepspeedai#594) * NCCL based 1-bit Implementation + Refactor to add communication backends (deepspeedai#593) * add nccl 1-bit optim. * temporary commit to save stuff. * Use dist collectives instead of mpi routines. * remove old code for comm. * Fix bugs. still does not work. * modify to test the nccl side code path * Initial gather impl. Works intra-node. * Updates to comm. phase 2. nccl comm. passed the tests. * refactor code to introduce nccl/mpi as backends for onebit adam. * Refactor updates to test/engine. * Fix compile/runtime errors. * simplify support for nccl/mpi backends. * Add missign file * Add compression backend in constructor. Revert later. * modify test with some perf counting. * Implement a true non-blocking gather for nccl side. * Revert "Add compression backend in constructor. Revert later." This reverts commit df8c40d. * improve the 1-bit adam test. * Refactor comm. and compression backend in 1-bit adam. * Fix the test. * Fix runtime errors and typos in nccl backend * fix mpi backend. modify tests. * modify nccl perf test. * fix mpi side errors. * Add an mpi perf test * Sync DSE. * Remove old collectives file. * Undo a typo. * Graceful failure for torch versions that don't support nccl pt2pt. * Revert "Merge branch 'master' into staging-1bit-nccl-v2" This reverts commit 7840085, reversing changes made to a6dba72. * Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2"" This reverts commit 6dbdd98. * comm optimization + 1-bit lamb * Saving/debugging commit. * finalizing 1-bit lamb * finalizing 1-bit lamb * add momentum mask and chkpt handling for 1-bit adam * Cleanup and modify nccl test to be runnable with deepspeed launcher. * Fix format. * fix formatting again. * make test runnable without mpi4py * Add dist.alltoall and dist.allgather instead of custom functions. * remove debug prints. * formatting and renaming * renaming * renaming * add unit test, fix existing tests * skip unit test when torch < 1.8 * revert 1-bit lamb * flatten momentum when dimension is more than 1 * add warning message for 1-bit adam under fp32 * improve version check * add fp32 test * 1-bit adam doc * fix file name * doc fix * torch 1.8 is released * doc fix * fix tests * update news * add doc for momentum mask * fix checkpoing handling, add unit test * checkpoint handling doc * doc final cleanup * bump dates * update tests * url change * doc fix * fix test * doc update Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * consistent checkpoint filenaming (deepspeedai#865) * consistent checkpoint filenaming * backward compatible rename Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * [doc] launcher (deepspeedai#868) As discussed in deepspeedai#662 this PR modifies the doc: * explains what to use instead of CUDA_VISIBLE_DEVICES * puts the `--hostfile` cl arg in the correct place in the invocation script Fixes: deepspeedai#662 Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [doc] pipeline (deepspeedai#888) * [doc] pipeline As @g-karthik flagged in deepspeedai#659 (comment) my previous correction PR had one sentence that said the wrong thing. So this PR attempts to rectify that. Thank you! * tweak * [debug utils] see_memory_usage fixes (deepspeedai#890) * see_memory_usage fixes * didn't expect pt-1.2 * fix the order of things * fix the order of things * full fp32 weights reconstruction for zero 2+3 (deepspeedai#892) * save_fp16_model consolidated for zero3 (deepspeedai#893) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training (deepspeedai#861) * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in deepspeedai#707 As some model trainable parameters skipped in training, their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run, so they have no norm_for_param_grads * Trim space * Trim space Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * mlperf attn initial commit * update kramdown (deepspeedai#901) security alert related to older kramdown version * update backward api doc (deepspeedai#903) * Bump kramdown from 2.3.0 to 2.3.1 in /docs (deepspeedai#905) Bumps [kramdown](https://github.com/gettalong/kramdown) from 2.3.0 to 2.3.1. - [Release notes](https://github.com/gettalong/kramdown/releases) - [Changelog](https://github.com/gettalong/kramdown/blob/master/doc/news.page) - [Commits](https://github.com/gettalong/kramdown/commits) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * We're hiring! + integration posts * [website] We're hiring! + integration posts * [website] we're hiring! * zero.Init() clarification (deepspeedai#880) * zero.Init() clarification clarify that if `model.half()` can't fit into gpu memory `zero.Init()` is a must. this proposal is via @samyam's clarification shared elsewhere. Thank you. * style * add clarity * style Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * disable pipe test (deepspeedai#915) This test has been giving us trouble for a bit, seeing nondeterministic failures, skipping for now to not break out CI. Need to revisit soon though. * Add link to AML examples. (deepspeedai#916) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * add inference_batch fn * Add space in help string (deepspeedai#926) * Fix for fragmented linear inputs in ZeRO 3 Linear layers where reshap… (deepspeedai#881) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [zero3] GatheredParameters can now handle a list of params (deepspeedai#884) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * fix cpu_adam memory leak on deepspeed re-use in the same process (deepspeedai#896) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [benchmarks] flatten/unflatten benchmarks (deepspeedai#919) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * improved readability + typos (deepspeedai#895) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [zero doc] fix misspelled param (deepspeedai#878) We really really really need those params to be validated... Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Samyamr/stage 3 skip modules without parameters (deepspeedai#867) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * docs (deepspeedai#909) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Supporting different hidden dimensions for transformer kernels-v2 (deepspeedai#934) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Pull changes from DeepSpeed * Pull changes from DeepSpeed * Pull changes from DeepSpeed * Pull changes from DeepSpeed * Pull changes from DeepSpeed * Pull changes from DeepSpeed * cleanup, reinstantiate sending of logits / layer_past * cleanup, reinstantiate sending of logits / layer_past * bump to 0.3.14 * add pypi badge * Delete check of pdsh (deepspeedai#941) * fix double linear override; spelling (deepspeedai#954) * [config] turn exponential notation back on for config dump (deepspeedai#955) * e-notation for large floats * handle ints too * readability * handle bool Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * document how to override ~/.cache/torch_extensions (deepspeedai#959) * [zero] faster flatten/unflatten (cpp version) (deepspeedai#910) * faster flatten/unflatten with apex * switch to cpp flatten/unflatten * style * better comment * missing import * switch to build ops at run time * fixes Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * update lr scheduler doc for doing per step or epoch update (deepspeedai#913) * update lr scheduler doc for doing per step or epoch update * work * trigger build Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Fix ZeRO-3 UnboundLocalError (deepspeedai#968) * Fix UnboundLocalError * Get full partition size * ZeRO-Infinity (deepspeedai#976) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> * revert zero-inf change to launcher * [docs] zero-inf updates * bump to 0.3.15 * ZeRO-Infinity tutorial additions (deepspeedai#978) * zinf tutorial * more megatron integration docs * [docs] add ZeRO-Inf news items * refactor * ZeRO-Infinity docs (deepspeedai#979) * zinf tutorial * more megatron integration docs * ZInf + tiling docs * [docs] zero-inf updates * assert no Z2/Z3 with pipeline and fix some docs links (deepspeedai#980) * add option to force multi-node launcher mode (deepspeedai#977) * [ZeRO Infinity] Allow Init to take a dict for the deepspeed config (deepspeedai#983) * Add check to see if json file is already loaded * Update doc * Address review * Remove doc comment Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * make bold+italic work without escaping _ (deepspeedai#775) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * remove debug prints: (deepspeedai#986) * 1-bit LAMB optimizer (deepspeedai#970) 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. Author: @conglongli, @awan-10, @samyam, Hanlin Tang, Yuxiong He Paper: https://arxiv.org/abs/2104.06069 Co-authored-by: sdtblck <46172032+sdtblck@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Use odd shape tensor to represent parameter data in partitioned state (deepspeedai#981) * use wierd shaped tensor to avoid silent failures when not registering externel params * fix typo Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Make reduce scatter optional for ZeRO-1 as workaround (deepspeedai#971) * Make reduce scatter optional for ZeRO-1 as workaround * Make allreduce default for ZeRO 1 Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Fix all Pipeline Module Parameters being sent to cuda:0 (deepspeedai#687) * remove communicate overflow (already in utils.CheckOverflow) Co-authored-by: sid <sidney.black@aleph-alpha.de> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: brett koonce <koonce@gmail.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: hamlet <gvvvv@163.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Takuya Makino <takuyamakino15@gmail.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Sean Naren <sean@grid.ai>

jeffra added 5 commits March 5, 2021 11:15

only set args.local_rank if the attr exists already

124de68

add unit test

6c4b949

formatting

775ffac

make args optional

badced0

fix missing hidden

876eda8

jeffra requested review from RezaYazdaniAminabadi, ShadenSmith, arashashari, awan-10, cli99, conglongli, eltonzheng, minjiaz, niumanar, samyam and tjruwase as code owners March 5, 2021 19:45

jeffra changed the title ~~only set args.local_rank if the attr exists already~~ Allow args to be optional in deepspeed.initialize Mar 5, 2021

Merge branch 'master' into jeffra/args-local-rank

6d4ab2f

tjruwase approved these changes Mar 16, 2021

View reviewed changes

Comment thread deepspeed/runtime/engine.py

restructure sanity check for LOCAL_RANK

17ce3b6

jeffra merged commit 871f304 into master Mar 16, 2021

jeffra deleted the jeffra/args-local-rank branch March 16, 2021 19:38

stas00 mentioned this pull request Mar 17, 2021

[DeepSpeed] simplify init huggingface/transformers#10762

Merged

stas00 mentioned this pull request Mar 26, 2021

Error building extension 'cpu_adam' #889

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow args to be optional in deepspeed.initialize#825

Allow args to be optional in deepspeed.initialize#825
jeffra merged 7 commits intomasterfrom
jeffra/args-local-rank

jeffra commented Mar 5, 2021 •

edited

Loading

Uh oh!

stas00 commented Mar 5, 2021

Uh oh!

SeanNaren commented Mar 6, 2021

Uh oh!

jeffra commented Mar 16, 2021

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jeffra commented Mar 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Mar 5, 2021

Uh oh!

SeanNaren commented Mar 6, 2021

Uh oh!

jeffra commented Mar 16, 2021

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeffra commented Mar 5, 2021 •

edited

Loading