Skip to content

[shardformer] merge shardformer to main#4152

Merged
FrankLeeeee merged 49 commits intomainfrom
feature/shardformer
Jul 4, 2023
Merged

[shardformer] merge shardformer to main#4152
FrankLeeeee merged 49 commits intomainfrom
feature/shardformer

Conversation

@FrankLeeeee
Copy link
Copy Markdown
Contributor

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

N/A

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

This PR releases the shardformer feature to the main branch.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

FoolPlayer and others added 30 commits June 26, 2023 10:06
* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example
)

* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example

* add share weight and train example

* add train

* add docstring and readme

* add docstring for other files

* pre-commit
* [shardformer] refactored the user api

* polish code
* update readme with modules content

* remove img
)

* add dropout layer, add dropout test

* modify seed manager as context manager

* add a copy of col_nn.layer

* add dist_crossentropy loss; separate module test

* polish the code

* fix dist crossentropy loss
…3883)

* add gpt2 policy and modify shard and slicer to support

* remove unused code

* polish code
* add bert align test, fix dist loss bug

* forward and backward align

* add ignore index

* add shardformer CI

* add gather_output optional for user in shardconfig

* update readme with optional gather_ouput

* add dist crossentropy loss test, remove unused files

* remove unused file

* remove unused file

* rename the file

* polish code
* fix bug in slicer, add slicer unit test

* add dropout test

* use pid as dropout seed

* updata dropout test with local pattern

* ad todo
#3949)

* add dist dropout in model

* update docstring and bert policy with dropout

* refactor basepolicy and sharded, update bert

* update format

* update gpt2 policy

* update bert policy

* remove unused code

* update readme for new policy usage
* add dist dropout in model

* update docstring and bert policy with dropout

* refactor basepolicy and sharded, update bert

* update format

* update gpt2 policy

* update bert policy

* remove unused code

* update readme for new policy usage

* add downstream model of bert

* remove unused code
* fix an error in readme

* simplify code
* fix an error in readme

* simplify code

* refactor shardformer

* add todo

* remove slicer

* resolve code review
* [shardformer] integrated linear 1D with dtensor

* polish code
)

* [shardformer] refactored embedding and dropout to parallel module

* polish code
* fix bert downstream with new api

* remove comment line
* add gpt2 test and layer class refactor

* add dropout in gpt2 policy
* [shardformer] adapted T5 and LLaMa test to use kit

* polish code
FoolPlayer and others added 18 commits June 26, 2023 10:10
* support kit use for bert test

* support kit test for gpt2
* [shardformer] support module saving and loading

* polish code
* add linearconv1d test

* add linearconv1d test
* add layernorm to bert

* add layernorm test

* add layernorm test with load state dict

* add use_mixedfusedLN in shard config

* refactor policy to support fused_layernorm
* [test] fixed tests failed due to dtensor change

* polish code
* [shardformer] shardformer support opt models

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix
* first v of vit shardformer

* keep vit

* update

* vit shard add vitattention vitlayer

* update num head shard para

* finish test for vit

* add new_model_class & postprocess

* add vit readme

* delete old files & fix the conflict

* fix sth
* [shardformer] add benchmark of shardformer

* [shardformer] add benchmark of shardformer
* [shardformer] refactored some doc and api

* polish code
* [shardformer] made tensor parallelism configurable

* polish code
Comment thread applications/Chat/coati/trainer/.sft.py.swp Outdated
Comment thread colossalai/shardformer/examples/shardformer_benchmark.py
@FrankLeeeee FrankLeeeee mentioned this pull request Jul 4, 2023
10 tasks
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 4, 2023

The code coverage for the changed files is 84%.

Click me to view the complete report
Name                                                                                       Stmts   Miss  Cover
--------------------------------------------------------------------------------------------------------------
colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py                           164     82    50%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/matmul_strategy_generator.py     388     76    80%
colossalai/auto_parallel/tensor_shard/utils/misc.py                                           45      9    80%
colossalai/checkpoint_io/utils.py                                                            243     43    82%
colossalai/device/device_mesh.py                                                             178     14    92%
colossalai/lazy/lazy_init.py                                                                 299     40    87%
colossalai/nn/layer/base_layer.py                                                             36     15    58%
colossalai/nn/layer/parallel_1d/_operation.py                                                 53     26    51%
colossalai/shardformer/__init__.py                                                             1      0   100%
colossalai/shardformer/_utils.py                                                              42     15    64%
colossalai/shardformer/layer/__init__.py                                                       7      0   100%
colossalai/shardformer/layer/_operation.py                                                   152     51    66%
colossalai/shardformer/layer/dropout.py                                                       35      0   100%
colossalai/shardformer/layer/embedding.py                                                    118      5    96%
colossalai/shardformer/layer/linear.py                                                       156     23    85%
colossalai/shardformer/layer/loss.py                                                          49      8    84%
colossalai/shardformer/layer/normalization.py                                                 50     10    80%
colossalai/shardformer/layer/parallel_module.py                                               76     21    72%
colossalai/shardformer/layer/qkv_fused_linear.py                                             196     29    85%
colossalai/shardformer/layer/utils.py                                                         81     10    88%
colossalai/shardformer/modeling/__init__.py                                                    0      0   100%
colossalai/shardformer/modeling/bloom.py                                                      29      5    83%
colossalai/shardformer/policies/__init__.py                                                    0      0   100%
colossalai/shardformer/policies/autopolicy.py                                                 27      2    93%
colossalai/shardformer/policies/basepolicy.py                                                 50      5    90%
colossalai/shardformer/policies/bert.py                                                      109      2    98%
colossalai/shardformer/policies/bloom.py                                                      60      3    95%
colossalai/shardformer/policies/gpt2.py                                                       68      2    97%
colossalai/shardformer/policies/llama.py                                                      43      2    95%
colossalai/shardformer/policies/opt.py                                                        50      2    96%
colossalai/shardformer/policies/t5.py                                                         74      2    97%
colossalai/shardformer/policies/vit.py                                                        26     26     0%
colossalai/shardformer/shard/__init__.py                                                       4      0   100%
colossalai/shardformer/shard/shard_config.py                                                  21      2    90%
colossalai/shardformer/shard/sharder.py                                                       66      5    92%
colossalai/shardformer/shard/shardformer.py                                                   13      0   100%
colossalai/tensor/comm_spec.py                                                               253     93    63%
colossalai/tensor/d_tensor/__init__.py                                                         4      0   100%
colossalai/tensor/d_tensor/api.py                                                            136     18    87%
colossalai/tensor/d_tensor/comm_spec.py                                                      151     35    77%
colossalai/tensor/d_tensor/layout.py                                                          38      1    97%
colossalai/tensor/d_tensor/layout_converter.py                                               195     12    94%
colossalai/tensor/d_tensor/utils.py                                                           38      7    82%
colossalai/tensor/shape_consistency.py                                                       294    120    59%
colossalai/tensor/sharding_spec.py                                                           139     13    91%
colossalai/testing/__init__.py                                                                 4      0   100%
colossalai/testing/comparison.py                                                              54      9    83%
tests/kit/model_zoo/registry.py                                                               18      0   100%
tests/kit/model_zoo/transformers/__init__.py                                                   7      0   100%
tests/kit/model_zoo/transformers/bert.py                                                      42      0   100%
tests/kit/model_zoo/transformers/bloom.py                                                     34      0   100%
tests/kit/model_zoo/transformers/gpt.py                                                       28      0   100%
tests/kit/model_zoo/transformers/llama.py                                                     26      2    92%
tests/kit/model_zoo/transformers/opt.py                                                       32      0   100%
tests/kit/model_zoo/transformers/t5.py                                                        24      0   100%
tests/test_autochunk/test_autochunk_diffuser/test_autochunk_unet.py                           36     10    72%
tests/test_booster/test_mixed_precision/test_fp16_torch.py                                    30      1    97%
tests/test_booster/test_plugin/test_gemini_plugin.py                                          74     10    86%
tests/test_booster/test_plugin/test_low_level_zero_plugin.py                                  60      6    90%
tests/test_booster/test_plugin/test_torch_ddp_plugin.py                                       78      0   100%
tests/test_booster/test_plugin/test_torch_fsdp_plugin.py                                      43      0   100%
tests/test_checkpoint_io/test_gemini_checkpoint_io.py                                         80      0   100%
tests/test_device/test_device_mesh.py                                                         58     36    38%
tests/test_device/test_init_logical_pg.py                                                     27      1    96%
tests/test_fx/test_tracer/test_hf_model/hf_tracer_utils.py                                    21      2    90%
tests/test_fx/test_tracer/test_hf_model/test_hf_albert.py                                     17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_bert.py                                       15      1    93%
tests/test_fx/test_tracer/test_hf_model/test_hf_diffuser.py                                   50     28    44%
tests/test_fx/test_tracer/test_hf_model/test_hf_gpt.py                                        17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_opt.py                                        15      1    93%
tests/test_fx/test_tracer/test_hf_model/test_hf_t5.py                                         17      1    94%
tests/test_fx/test_tracer/test_timm_model/test_timm_model.py                                  36     24    33%
tests/test_fx/test_tracer/test_torchaudio_model/test_torchaudio_model.py                      14      5    64%
tests/test_fx/test_tracer/test_torchrec_model/test_deepfm_model.py                            39      3    92%
tests/test_fx/test_tracer/test_torchrec_model/test_dlrm_model.py                              41      4    90%
tests/test_fx/test_tracer/test_torchvision_model/test_torchvision_model.py                    31      1    97%
tests/test_lazy/lazy_init_utils.py                                                            72     14    81%
tests/test_lazy/test_distribute.py                                                            73      3    96%
tests/test_lazy/test_models.py                                                                13      1    92%
tests/test_shardformer/__init__.py                                                             0      0   100%
tests/test_shardformer/test_layer/test_dist_crossentropy.py                                   27      1    96%
tests/test_shardformer/test_layer/test_dropout.py                                             42      1    98%
tests/test_shardformer/test_layer/test_embedding.py                                           30      1    97%
tests/test_shardformer/test_layer/test_layernorm.py                                           27      1    96%
tests/test_shardformer/test_layer/test_linear_1d.py                                           85      1    99%
tests/test_shardformer/test_layer/test_qkv_fused_linear_1d.py                                 73      1    99%
tests/test_shardformer/test_layer/test_vocab_parallel_embedding_1d.py                         32      1    97%
tests/test_shardformer/test_model/__init__.py                                                  0      0   100%
tests/test_shardformer/test_model/_utils.py                                                   21      0   100%
tests/test_shardformer/test_model/test_shard_bert.py                                          56      1    98%
tests/test_shardformer/test_model/test_shard_bloom.py                                         56      1    98%
tests/test_shardformer/test_model/test_shard_gpt2.py                                          56      1    98%
tests/test_shardformer/test_model/test_shard_llama.py                                         58      1    98%
tests/test_shardformer/test_model/test_shard_opt.py                                           59      1    98%
tests/test_shardformer/test_model/test_shard_t5.py                                            64      1    98%
tests/test_shardformer/test_model/test_shard_vit.py                                           35     20    43%
tests/test_shardformer/test_with_torch_ddp.py                                                 46      2    96%
tests/test_tensor/test_dtensor/test_comm_spec.py                                              78      1    99%
tests/test_tensor/test_dtensor/test_dtensor.py                                                65      5    92%
tests/test_tensor/test_dtensor/test_layout_converter.py                                       91      1    99%
tests/test_tensor/test_shape_consistency.py                                                   50      2    96%
tests/test_tensor/test_sharded_linear.py                                                     130      1    99%
tests/test_tensor/test_sharding_spec.py                                                       13      1    92%
--------------------------------------------------------------------------------------------------------------
TOTAL                                                                                       6677   1045    84%

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 4, 2023

The code coverage for the changed files is 85%.

Click me to view the complete report
Name                                                                                       Stmts   Miss  Cover
--------------------------------------------------------------------------------------------------------------
colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py                           164     82    50%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/matmul_strategy_generator.py     388     76    80%
colossalai/auto_parallel/tensor_shard/utils/misc.py                                           45      9    80%
colossalai/checkpoint_io/utils.py                                                            243     43    82%
colossalai/device/device_mesh.py                                                             178     14    92%
colossalai/lazy/lazy_init.py                                                                 299     40    87%
colossalai/nn/layer/base_layer.py                                                             36     15    58%
colossalai/nn/layer/parallel_1d/_operation.py                                                 53     26    51%
colossalai/shardformer/__init__.py                                                             1      0   100%
colossalai/shardformer/_utils.py                                                              42     15    64%
colossalai/shardformer/layer/__init__.py                                                       7      0   100%
colossalai/shardformer/layer/_operation.py                                                   152     51    66%
colossalai/shardformer/layer/dropout.py                                                       35      0   100%
colossalai/shardformer/layer/embedding.py                                                    118      5    96%
colossalai/shardformer/layer/linear.py                                                       156     23    85%
colossalai/shardformer/layer/loss.py                                                          49      8    84%
colossalai/shardformer/layer/normalization.py                                                 50     10    80%
colossalai/shardformer/layer/parallel_module.py                                               76     21    72%
colossalai/shardformer/layer/qkv_fused_linear.py                                             196     29    85%
colossalai/shardformer/layer/utils.py                                                         81     10    88%
colossalai/shardformer/modeling/__init__.py                                                    0      0   100%
colossalai/shardformer/modeling/bloom.py                                                      29      5    83%
colossalai/shardformer/policies/__init__.py                                                    0      0   100%
colossalai/shardformer/policies/autopolicy.py                                                 27      2    93%
colossalai/shardformer/policies/basepolicy.py                                                 50      5    90%
colossalai/shardformer/policies/bert.py                                                      109      2    98%
colossalai/shardformer/policies/bloom.py                                                      60      3    95%
colossalai/shardformer/policies/gpt2.py                                                       68      2    97%
colossalai/shardformer/policies/llama.py                                                      43      2    95%
colossalai/shardformer/policies/opt.py                                                        50      2    96%
colossalai/shardformer/policies/t5.py                                                         74      2    97%
colossalai/shardformer/policies/vit.py                                                        26     26     0%
colossalai/shardformer/shard/__init__.py                                                       4      0   100%
colossalai/shardformer/shard/shard_config.py                                                  21      2    90%
colossalai/shardformer/shard/sharder.py                                                       66      5    92%
colossalai/shardformer/shard/shardformer.py                                                   13      0   100%
colossalai/tensor/comm_spec.py                                                               253     93    63%
colossalai/tensor/d_tensor/__init__.py                                                         4      0   100%
colossalai/tensor/d_tensor/api.py                                                            136     18    87%
colossalai/tensor/d_tensor/comm_spec.py                                                      151     35    77%
colossalai/tensor/d_tensor/layout.py                                                          38      1    97%
colossalai/tensor/d_tensor/layout_converter.py                                               195     12    94%
colossalai/tensor/d_tensor/utils.py                                                           38      7    82%
colossalai/tensor/shape_consistency.py                                                       294    120    59%
colossalai/tensor/sharding_spec.py                                                           139     13    91%
colossalai/testing/__init__.py                                                                 4      0   100%
colossalai/testing/comparison.py                                                              54      9    83%
tests/kit/model_zoo/registry.py                                                               18      0   100%
tests/kit/model_zoo/transformers/__init__.py                                                   7      0   100%
tests/kit/model_zoo/transformers/bert.py                                                      42      0   100%
tests/kit/model_zoo/transformers/bloom.py                                                     34      0   100%
tests/kit/model_zoo/transformers/gpt.py                                                       28      0   100%
tests/kit/model_zoo/transformers/llama.py                                                     26      2    92%
tests/kit/model_zoo/transformers/opt.py                                                       32      0   100%
tests/kit/model_zoo/transformers/t5.py                                                        24      0   100%
tests/test_autochunk/test_autochunk_diffuser/test_autochunk_unet.py                           36     10    72%
tests/test_booster/test_mixed_precision/test_fp16_torch.py                                    30      1    97%
tests/test_booster/test_plugin/test_gemini_plugin.py                                          74     10    86%
tests/test_booster/test_plugin/test_low_level_zero_plugin.py                                  60      6    90%
tests/test_booster/test_plugin/test_torch_ddp_plugin.py                                       78      0   100%
tests/test_booster/test_plugin/test_torch_fsdp_plugin.py                                      43      0   100%
tests/test_checkpoint_io/test_gemini_checkpoint_io.py                                         80      0   100%
tests/test_device/test_device_mesh.py                                                         58     36    38%
tests/test_device/test_init_logical_pg.py                                                     27      1    96%
tests/test_fx/test_tracer/test_hf_model/hf_tracer_utils.py                                    21      2    90%
tests/test_fx/test_tracer/test_hf_model/test_hf_albert.py                                     17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_bert.py                                       15      1    93%
tests/test_fx/test_tracer/test_hf_model/test_hf_diffuser.py                                   50     28    44%
tests/test_fx/test_tracer/test_hf_model/test_hf_gpt.py                                        17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_opt.py                                        15      1    93%
tests/test_fx/test_tracer/test_hf_model/test_hf_t5.py                                         17      1    94%
tests/test_fx/test_tracer/test_torchrec_model/test_deepfm_model.py                            39      3    92%
tests/test_fx/test_tracer/test_torchrec_model/test_dlrm_model.py                              41      4    90%
tests/test_fx/test_tracer/test_torchvision_model/test_torchvision_model.py                    31      1    97%
tests/test_lazy/lazy_init_utils.py                                                            72     14    81%
tests/test_lazy/test_distribute.py                                                            73      3    96%
tests/test_lazy/test_models.py                                                                13      1    92%
tests/test_shardformer/__init__.py                                                             0      0   100%
tests/test_shardformer/test_layer/test_dist_crossentropy.py                                   27      1    96%
tests/test_shardformer/test_layer/test_dropout.py                                             42      1    98%
tests/test_shardformer/test_layer/test_embedding.py                                           30      1    97%
tests/test_shardformer/test_layer/test_layernorm.py                                           27      1    96%
tests/test_shardformer/test_layer/test_linear_1d.py                                           85      1    99%
tests/test_shardformer/test_layer/test_qkv_fused_linear_1d.py                                 73      1    99%
tests/test_shardformer/test_layer/test_vocab_parallel_embedding_1d.py                         32      1    97%
tests/test_shardformer/test_model/__init__.py                                                  0      0   100%
tests/test_shardformer/test_model/_utils.py                                                   21      0   100%
tests/test_shardformer/test_model/test_shard_bert.py                                          56      1    98%
tests/test_shardformer/test_model/test_shard_bloom.py                                         56      1    98%
tests/test_shardformer/test_model/test_shard_gpt2.py                                          56      1    98%
tests/test_shardformer/test_model/test_shard_llama.py                                         58      1    98%
tests/test_shardformer/test_model/test_shard_opt.py                                           59      1    98%
tests/test_shardformer/test_model/test_shard_t5.py                                            64      1    98%
tests/test_shardformer/test_model/test_shard_vit.py                                           35     20    43%
tests/test_shardformer/test_with_torch_ddp.py                                                 46      2    96%
tests/test_tensor/test_dtensor/test_comm_spec.py                                              78      1    99%
tests/test_tensor/test_dtensor/test_dtensor.py                                                65      5    92%
tests/test_tensor/test_dtensor/test_layout_converter.py                                       91      1    99%
tests/test_tensor/test_shape_consistency.py                                                   50      2    96%
tests/test_tensor/test_sharded_linear.py                                                     130      1    99%
tests/test_tensor/test_sharding_spec.py                                                       13      1    92%
--------------------------------------------------------------------------------------------------------------
TOTAL                                                                                       6627   1016    85%

@FrankLeeeee FrankLeeeee merged commit f447ca1 into main Jul 4, 2023
@FrankLeeeee FrankLeeeee deleted the feature/shardformer branch July 4, 2023 08:05
@FrankLeeeee FrankLeeeee self-assigned this Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

6 participants