Skip to content

Merge base#1

Merged
ghosthamlet merged 98 commits intoghosthamlet:masterfrom
deepspeedai:master
Mar 15, 2021
Merged

Merge base#1
ghosthamlet merged 98 commits intoghosthamlet:masterfrom
deepspeedai:master

Conversation

@ghosthamlet
Copy link
Copy Markdown
Owner

No description provided.

jeffra and others added 30 commits December 11, 2020 10:05
* fix arch flags, add PTX

* bug fix

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* [doc] xref to hostfile discussion

wasn't clear where to find what was meant by `hostfile` - so adding a link to where it's discussed.

* remove whitespace
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Allow DeepSpeed models to be initialized with optimizer=None

Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.10.10 to 1.11.0.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/master/CHANGELOG.md)
- [Commits](sparklemotion/nokogiri@v1.10.10...v1.11.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Remove a very verbose print statement.

* Update engine.py
* Add Linear warmup+decay lr schedule
Update lr schedule unit tests

* LR scheduler unit tests for LR Range Test and 1Cycle

* Disable yapf to preserve parameterizaton

* Disable test_pipe.py for CI debugging

* Disable test_lr_scheduler for CI debugging

* Disable test_lr_scheduler for CI debugging

* Enable all unit tests for CI debugging

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* move workspace memory-allocation to PyTorch

* refine the code based on the comments

* remove unnecessary options

* remove bsz from set_seq_len function
jeffra and others added 29 commits February 18, 2021 16:20
Invalid param name

Thanks.
* fix the bias-add precision and indexing and also adding the layer-norm-eps as a configurable parameter for transformer

* add ACC_HALF config

* use defined to check if ACC_Half is defined
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
hi, i take a look at the code of column_sum_reduce, i have 2 questions:
   1. the goal of column_sum_reduce is to get the column sum of inp matrix with shape[rows, width] and the result shape should be [width],right ? It seems that the judgment condition of pos is not suitable
   2. the implementation of cuda kernel based on the asumption that, the thread with same threadIdx.y will group into a thread_block_tile, the blockDim is (32,32), i read the nvidia document https://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf, THREAD BLOCK TILE is a subset of threads of a thread block, divided into tiles in row-major order. doesn't it mean thread with the same threadIdx.x will group into a thread_block_tile ?
thanks !!!!

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
* fixing buffers in transformer kernel when gelu-checkpoint is enabled

* fixing the test issue for other memory optimization flags

* fixing a bug for when attn_dropout_checkpoint is enabled
* Squash stage3 v1 (#146)

Co-authored-by: Samyam <samyamr@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>

* Fix correctness bug (#147)

* formatting fix (#150)

* stage3 bugfix (API) update and simplified FP16 Z3 tests (#151)

* fp16 Z3 API update and bugfix

* revert debug change

* ZeRO-3 detach and race condition bugfixes (#149)

* trying out ZeRO-3 race condition fix

* CUDA sync instead of stream

* reduction stream sync

* remove commented code

* Fix optimizer state_dict KeyError (#148)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152)

* Simplifying the logic for getting averaged gradients (#153)

* skip for now

* Z3 Docs redux (#154)

* removing some TODOs and commented code (#155)

* New Z3 defaults (#156)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* formatting

* megatron external params

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
…t in the website (#799)

* add optimizers and schedules to rtd

* update ds website and fix links

* add optimizers and schedules to rtd

* update ds website and fix links

* add flops profiler to rtd

* fix

Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
* Control ZeRO wall clock timers

* Disable more ZeRO3 debug prints

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* fix log(0) & 1/log(1) bugs

* simplify

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>
…have 'params' (#827)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Admin merging for pure-doc PR that does not trigger build.
@ghosthamlet ghosthamlet merged commit 517357e into ghosthamlet:master Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.