IFU-master-2021-11-23 #48

rraminen · 2021-11-23T23:16:00Z

Resolved conflicts in setup.py, op_builder/init.py and pointed DeepSpeedExamples to the latest commit, 206e48b7638d1a36e90466071bd0f50844a3002b, after ROCm/DeepSpeedExamples#16 IFU

init.txt

setup.txt

I have verified Megatron-LM-v1.1.5 gpt2 and Bing BERT. Both work fine.

cc: @jithunnair-amd

Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Shaden Smith <shaden.smith@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: eltonzheng <eltonz@microsoft.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…edai#1411) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Fix spelling errors in inference tutorial * Remove unused imports in inference tutorial * Fix inference tutorial code to work with 1 GPU

…ai#1418)

) * Add flexibility of pipeline module and engine * Separate PRs * Separate PRs Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Fix typos in docs/ * Fix typos in code comments and output strings * Fix typos in the code itself * Fix typos in tests/ Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

…eedai#1376) * Update checkpointing.py * Fix formatting * Add flexibility of pipeline module and engine * Separate PRs * Separate PRs * Update checkpointing.py * Update checkpointing.py * Reflect code review for contiguous activation checkpointing * remove useless condition Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Revise param_shapes to be a list of ordered dict * test i can push * add tests; split z2 and z3 into separate funcs Co-authored-by: Xiaopeng Li <xiaopel@amazon.com> Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

HF tests `image-classification` requirements have been fixed to not require pt-1.9

* DeepSpeedInferenceConfig get epsilon value from config * epsilon -> layer_norm_eps to keep var name same as in DeepSpeedTransformerConfig * DeepSpeedTransformerConfig get epsilon value from config * configurabale stochastic_mode eg: 1. For LM pre-training True 2. For LM fine-tuning on task False * Updated replace_module.py checking layer_norm_eps is attribute of config default 1e-12 Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

…1397) * fix the workspace allocation for the transformer kernel * change layer-id type & rm one unit test due to OOM

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Co-authored-by: Thomas <thomas@Thomass-MacBook-Pro.local> Co-authored-by: Shaden Smith <shaden.smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Tunji Ruwase <olruwase@microsoft.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* CL+PP * add TODO

* update docs w. 530b info * add tutorial link

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Co-authored-by: skpig <1900012999@pku.edu.cn>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…1530) * Add warmup_type arguments in WarmupLR and WarmupDecayLR * Add warmup_type unit test * replace hardcoded constants with global vars Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* [squash] Staging autotuning v4 Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Minjia Zhang <minjiaz@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * add new extra, guard xgboost, cleanup dead files (deepspeedai#268) * Fix autotuning docs (deepspeedai#1553) * fix docs * rewording the goal * fix typos * fix typos (deepspeedai#1556) * fix typos * fix format * fix bug (deepspeedai#1557) * fix bug Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Minjia Zhang <minjiaz@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…eck if gradient data is sparse (deepspeedai#1562) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…dai#1564) * Enforce nccl/rccl alignment of start location of each shard * Making yapf happy Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* guard tabulate package in case autotuning isn't installed * address comment

…pspeedai#960) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…epSpeedExamples to latest commit 206e48b7638d1a36e90466071bd0f50844a3002b after ROCm/DeepSpeedExamples#16 IFU

jithunnair-amd · 2021-11-23T23:35:31Z

CI runs are reportedly still broken. And the GPT2 CI script needs to be updated to use the Megatron v1.1.5 script.

jeffra and others added 30 commits September 29, 2021 15:34

[ci] hf dependencies (deepspeedai#1417)

8c5ffcd

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

[CI] make build names consistent

a4ffc83

Fix typo in assert message (deepspeedai#1420)

5612462

Save the model parallel group at inference engine statically (deepspe…

9f17087

…edai#1411) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Improve inference documentation (deepspeedai#1421)

aa8d97e

* Fix spelling errors in inference tutorial * Remove unused imports in inference tutorial * Fix inference tutorial code to work with 1 GPU

Add assert to ensure we don't skip unsupported grad dtypes (deepspeed…

0457bb1

…ai#1418)

Add flexibility of pipeline parallel module and engine (deepspeedai#1399

30965ea

) * Add flexibility of pipeline module and engine * Separate PRs * Separate PRs Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Fix many typos (deepspeedai#1423)

be789b1

* Fix typos in docs/ * Fix typos in code comments and output strings * Fix typos in the code itself * Fix typos in tests/ Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

[zero_to_fp32] fix to handle world_size (deepspeedai#1422)

dd6bf4d

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

rm max_seq_length arg:DeepSpeedTransformerConfig (deepspeedai#1362)

466b0e6

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

[CI] HF tests image-classification requirements (deepspeedai#1425)

86457c2

HF tests `image-classification` requirements have been fixed to not require pt-1.9

Fix the workspace allocation for the transformer kernel (deepspeedai#…

bc7778e

…1397) * fix the workspace allocation for the transformer kernel * change layer-id type & rm one unit test due to OOM

bump to 0.5.5

5fbff46

bump DSE commit

5986dbc

Fix typo (ValueError) (deepspeedai#1434)

d8e9ef6

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

CL for big science (deepspeedai#1440)

fbea7b4

* CL+PP * add TODO

fix cl for pp support (deepspeedai#1443)

cd7967d

[docs] 530b update (deepspeedai#1447)

57a9954

* update docs w. 530b info * add tutorial link

doc update (deepspeedai#1448)

4e73623

webpage link edit (deepspeedai#1450)

f88b0e5

fix sparse_attention imports (deepspeedai#1446)

80e263c

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Fix docstrings for lr_schedules.py (deepspeedai#1455)

c64a03d

Add basic MoE timing breakdown (deepspeedai#1428)

1fc74cb

document about engine.load_checkpoint() (deepspeedai#1457)

b4e5826

Co-authored-by: skpig <1900012999@pku.edu.cn>

fix typos and add improvements (deepspeedai#1463)

dd22428

RezaYazdaniAminabadi and others added 27 commits November 12, 2021 08:14

Fix sparse attention for small block-sizes (deepspeedai#1545)

3ed7730

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Fix zinf none swapper (deepspeedai#1550)

488105e

Add documentation for bfloat16 (git commit 648f7bf) (deepspeedai#1516)

b7cc7c8

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Update offload parameter names (deepspeedai#1536)

7567c76

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

[build] support cuda-11.5 (deepspeedai#1558)

fa8d6c0

Add autotuning news post (deepspeedai#1565)

bda3d0e

Fix partial recovery of sparse_tensor_module_names and dynamically ch…

4bf4ab7

…eck if gradient data is sparse (deepspeedai#1562) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

set hf hash (deepspeedai#1568)

da7bff4

bump DSE commit

4625add

Enforce nccl/rccl alignment of start location of each shard (deepspee…

4a0b103

…dai#1564) * Enforce nccl/rccl alignment of start location of each shard * Making yapf happy Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

[autotuning] guard tabulate package import (deepspeedai#1569)

938449e

* guard tabulate package in case autotuning isn't installed * address comment

[launcher/runner] respect CUDA_VISIBLE_DEVICES for a single node (dee…

e3c2d7b

…pspeedai#960) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Remove hard tensorboardX requirement (deepspeedai#1571)

a90497e

Render docs for pipe.ProcessTopology (deepspeedai#1505)

fafc827

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

switch bin files to use python3 instead of python (deepspeedai#1185)

236890d

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

[CI] transformers@master has been fixed (deepspeedai#1573)

74baf5b

Enables ZeRO-3 inference (deepspeedai#1514)

2332cb3

Several fixes for our read-the-docs build (deepspeedai#1579)

a8a17f2

bump to 0.5.8

8220674

remove debug prints (deepspeedai#1585)

bcf2bdd

Add documentation for TensorBoard logging (deepspeedai#1577)

e1b4aa8

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Replace brute force and add log (deepspeedai#1560)

e2b39de

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Removing ImportError from tutel import try/except (deepspeedai#1583)

1bc13fe

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Enable AVX256 on AMD CPU (deepspeedai#1360)

a637cc2

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Resolved conflicts in setup.py, op_builder/__init__.py and pointed De…

86b6f99

…epSpeedExamples to latest commit 206e48b7638d1a36e90466071bd0f50844a3002b after ROCm/DeepSpeedExamples#16 IFU

rraminen changed the title ~~Ifu master 2021 11 23~~ IFU-master-2021-11-23 Nov 23, 2021

jithunnair-amd merged commit 2280803 into ROCm:master Nov 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IFU-master-2021-11-23 #48

IFU-master-2021-11-23 #48

Uh oh!

rraminen commented Nov 23, 2021

Uh oh!

jithunnair-amd commented Nov 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

IFU-master-2021-11-23 #48

IFU-master-2021-11-23 #48

Uh oh!

Conversation

rraminen commented Nov 23, 2021

Uh oh!

jithunnair-amd commented Nov 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants