forked from deepspeedai/DeepSpeed
-
Notifications
You must be signed in to change notification settings - Fork 3
IFU-master-2021-09-14 #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Alex Muzio <Alex.Muzio@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Felipe Cruz Salinas <Andres.Cruz@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <shaden.smith@microsoft.com> Co-authored-by: Young Jin Kim <youki@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <shaden.smith@microsoft.com> Co-authored-by: Young Jin Kim <youki@microsoft.com>
* restore fp16 params if no zero ckpts available * formatting
…dai#1316) * Callable option for optimizer and scheduler * Add unit test * Formatting * Disable debug prints * Use base optimizer to construct lr scheduler * Formatting * Remove dead import
…pspeedai#1244) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
…se (deepspeedai#1309) * add more synchronizations and barriers for resolving gpu-halt issue * removing unuseful broadcasts
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* Rename PA_TO_cpu * Code cleanup * Revert accidental change
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Remove the wrong function with duplicate name * fix format. * add mpu check. fix tests.
* Added drop_last to DeepSpeedDataLoader This solves issue deepspeedai#326 * Updated drop_last in engine.py added drop_last as a ds_config as mentioned by @tjruwase * Update engine.py * Update engine.py * updated config.py and constants.py * Update constants.py * added dataloader_ prefix * Update dataloader.py * corrected yapf test errors * Update test_data.py Added dataloader_drop_last unit test * Corrected yapf and formatting issues * updated simple_model.py and test_data.py * Update simple_model.py * pre-commit fix * corrected issues * Update test_data.py * Update test_data.py * Update test_data.py * Update test_data.py * removed batch_size from test_data.py * Update simple_model.py * Update test_data.py * Update test_data.py * Fix unit test issues * Use fp32 to make things work Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Added 4-byte alignment on NCCL/RCCL * pre-commit formatting fixes * Fix for checkpoint loading with optimizer partitioning * Better assert print * Added unit tests for nccl/rccl 4-byte alignment * bug * Updated alignment to implicit Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* pass GAS boundary state from PP -> ZeRO * formatting Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
updated classifiers
Author
|
Errors observed in gpt2 workload unit tests summary on local system: |
* [zero Init] fix regression * clean up the warning
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* [zero_to_fp32] fix padding removal * style * fix comments Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
…ests (deepspeedai#1405) * install HF w. dev extra to get all required packages * switch ds.init to use param dict instead of json file on disk * switch back to 'testing' extra
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.11.4 to 1.12.5. - [Release notes](https://github.com/sparklemotion/nokogiri/releases) - [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md) - [Commits](sparklemotion/nokogiri@v1.11.4...v1.12.5) --- updated-dependencies: - dependency-name: nokogiri dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Collaborator
|
Closing this PR since #43 subsumes it |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Integrating from upstream