Skip to content

[zero] reorganize zero/gemini folder structure#3424

Merged
FrankLeeeee merged 16 commits intohpcaitech:mainfrom
ver217:refactor/zero
Apr 4, 2023
Merged

[zero] reorganize zero/gemini folder structure#3424
FrankLeeeee merged 16 commits intohpcaitech:mainfrom
ver217:refactor/zero

Conversation

@ver217
Copy link
Copy Markdown
Contributor

@ver217 ver217 commented Apr 4, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

Closes #3422

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Reorganize zero/gemini folder structure and fix some unit tests.

Current structure should be like:

zero
| --- legacy # all legacy code related to legacy zero
| --- low_level # code related to low level zero optim
       | --- LowLevelZeroOptimizer
| --- gemini 
       | --- GeminiDDP
       | --- GeminiOptimizer
       | --- ColoInitContext

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@ver217 ver217 added the gemini related to the gemini feature label Apr 4, 2023
@ver217 ver217 marked this pull request as ready for review April 4, 2023 04:23
@ver217 ver217 requested review from 1SAA and FrankLeeeee April 4, 2023 04:26
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2023

The code coverage for the changed files is 79%.

Click me to view the complete report
Name                                                                           Stmts   Miss  Cover
--------------------------------------------------------------------------------------------------
colossalai/auto_parallel/offload/base_offload_module.py                           71     47    34%
colossalai/auto_parallel/offload/region.py                                        81     64    21%
colossalai/booster/plugin/gemini_plugin.py                                       116     16    86%
colossalai/engine/_base_engine.py                                                 89     14    84%
colossalai/engine/schedule/_pipeline_schedule.py                                 430    261    39%
colossalai/initialize.py                                                         188     57    70%
colossalai/nn/layer/moe/experts.py                                               127     64    50%
colossalai/nn/layer/moe/layers.py                                                103     18    83%
colossalai/nn/parallel/__init__.py                                                 2      0   100%
colossalai/nn/parallel/data_parallel.py                                           99     26    74%
colossalai/zero/__init__.py                                                        4      0   100%
colossalai/zero/gemini/__init__.py                                                 7      0   100%
colossalai/zero/gemini/chunk/__init__.py                                           5      0   100%
colossalai/zero/gemini/chunk/chunk.py                                            314     46    85%
colossalai/zero/gemini/chunk/manager.py                                          131     16    88%
colossalai/zero/gemini/chunk/search_utils.py                                      90      1    99%
colossalai/zero/gemini/chunk/utils.py                                             31      0   100%
colossalai/zero/gemini/colo_init_context.py                                      100     25    75%
colossalai/zero/gemini/gemini_ddp.py                                             302     58    81%
colossalai/zero/gemini/gemini_hook.py                                             50      2    96%
colossalai/zero/gemini/gemini_mgr.py                                             101      7    93%
colossalai/zero/gemini/gemini_optimizer.py                                       205     39    81%
colossalai/zero/gemini/memory_tracer/__init__.py                                   7      0   100%
colossalai/zero/gemini/memory_tracer/chunk_memstats_collector.py                  17      1    94%
colossalai/zero/gemini/memory_tracer/memory_monitor.py                            71     34    52%
colossalai/zero/gemini/memory_tracer/memory_stats.py                              74     12    84%
colossalai/zero/gemini/memory_tracer/memstats_collector.py                        52      5    90%
colossalai/zero/gemini/memory_tracer/param_runtime_order.py                       25      4    84%
colossalai/zero/gemini/memory_tracer/runtime_mem_tracer.py                        64      0   100%
colossalai/zero/gemini/memory_tracer/static_memstats_collector.py                 72     55    24%
colossalai/zero/gemini/memory_tracer/utils.py                                     34     20    41%
colossalai/zero/gemini/placement_policy.py                                       144     51    65%
colossalai/zero/gemini/utils.py                                                   58      2    97%
colossalai/zero/legacy/__init__.py                                                19      1    95%
colossalai/zero/legacy/gemini/__init__.py                                          5      0   100%
colossalai/zero/legacy/gemini/gemini_context.py                                   29      3    90%
colossalai/zero/legacy/gemini/ophooks/__init__.py                                  2      0   100%
colossalai/zero/legacy/gemini/ophooks/_shard_grad_ophook.py                       19     19     0%
colossalai/zero/legacy/gemini/ophooks/_shard_param_ophook.py                      33     33     0%
colossalai/zero/legacy/gemini/ophooks/runtime_mem_tracer_hook.py                  94     12    87%
colossalai/zero/legacy/gemini/ophooks/utils.py                                    90     17    81%
colossalai/zero/legacy/gemini/paramhooks/__init__.py                               2      0   100%
colossalai/zero/legacy/gemini/paramhooks/_param_hookmgr.py                        18      1    94%
colossalai/zero/legacy/gemini/stateful_tensor.py                                 123      7    94%
colossalai/zero/legacy/gemini/stateful_tensor_mgr.py                              69      2    97%
colossalai/zero/legacy/gemini/tensor_placement_policy.py                          82     20    76%
colossalai/zero/legacy/gemini/tensor_utils.py                                     54      4    93%
colossalai/zero/legacy/init_ctx/__init__.py                                        2      0   100%
colossalai/zero/legacy/init_ctx/init_context.py                                  144      4    97%
colossalai/zero/legacy/shard_utils/__init__.py                                     4      0   100%
colossalai/zero/legacy/shard_utils/base_shard_strategy.py                         13      2    85%
colossalai/zero/legacy/shard_utils/bucket_tensor_shard_strategy.py                32      1    97%
colossalai/zero/legacy/shard_utils/commons.py                                     13      0   100%
colossalai/zero/legacy/shard_utils/tensor_shard_strategy.py                       38      2    95%
colossalai/zero/legacy/sharded_model/__init__.py                                   2      0   100%
colossalai/zero/legacy/sharded_model/_utils.py                                    51     23    55%
colossalai/zero/legacy/sharded_model/reduce_scatter.py                            94     50    47%
colossalai/zero/legacy/sharded_model/sharded_model_v2.py                         297    108    64%
colossalai/zero/legacy/sharded_model/utils.py                                     12      0   100%
colossalai/zero/legacy/sharded_model/zero_hook.py                                 73     11    85%
colossalai/zero/legacy/sharded_optim/__init__.py                                   2      0   100%
colossalai/zero/legacy/sharded_optim/sharded_optim_v2.py                         204     32    84%
colossalai/zero/legacy/sharded_param/__init__.py                                   3      0   100%
colossalai/zero/legacy/sharded_param/sharded_param.py                             68      3    96%
colossalai/zero/legacy/sharded_param/sharded_tensor.py                            26      0   100%
colossalai/zero/low_level/__init__.py                                              2      0   100%
colossalai/zero/low_level/_utils.py                                              125     48    62%
colossalai/zero/low_level/bookkeeping/__init__.py                                  5      0   100%
colossalai/zero/low_level/bookkeeping/base_store.py                               12      2    83%
colossalai/zero/low_level/bookkeeping/bucket_store.py                             28      0   100%
colossalai/zero/low_level/bookkeeping/gradient_store.py                           24      3    88%
colossalai/zero/low_level/bookkeeping/parameter_store.py                          48      0   100%
colossalai/zero/low_level/bookkeeping/tensor_bucket.py                            37      7    81%
colossalai/zero/low_level/low_level_optim.py                                     311     25    92%
colossalai/zero/wrapper.py                                                        36      7    81%
tests/test_auto_parallel/test_offload/test_perf.py                               111     85    23%
tests/test_auto_parallel/test_tensor_shard/test_compatibility_with_gemini.py      81     56    31%
tests/test_ddp/test_ddp_ignore_params.py                                          73      1    99%
tests/test_ddp/test_ddp_state_dict.py                                             52      2    96%
tests/test_gemini/test_gemini_manager.py                                          54      1    98%
tests/test_gemini/test_param_op.py                                                52      5    90%
tests/test_gemini/test_runtime_mem_tracer.py                                      38      1    97%
tests/test_gemini/update/test_chunk_mgrv2.py                                      52      1    98%
tests/test_gemini/update/test_chunkv2.py                                          90      1    99%
tests/test_gemini/update/test_fwd_bwd.py                                          81      1    99%
tests/test_gemini/update/test_gemini_use_rmt.py                                   77      1    99%
tests/test_gemini/update/test_get_torch_model.py                                  43      1    98%
tests/test_gemini/update/test_grad_clip.py                                        86      1    99%
tests/test_gemini/update/test_inference.py                                       103      1    99%
tests/test_gemini/update/test_optim.py                                           123      1    99%
tests/test_gemini/update/test_search.py                                           74      1    99%
tests/test_gemini/update/test_zeroddp_state_dict.py                               87      5    94%
tests/test_gemini/update/test_zerooptim_state_dict.py                             71      1    99%
tests/test_moe/test_moe_checkpoint.py                                             41      1    98%
tests/test_moe/test_moe_colo_init.py                                              43      2    95%
tests/test_moe/test_moe_zero_init.py                                              72      2    97%
tests/test_moe/test_moe_zero_model.py                                             55      1    98%
tests/test_moe/test_moe_zero_optim.py                                             85      3    96%
tests/test_optimizer/test_cpu_adam.py                                             54      4    93%
tests/test_optimizer/test_fused_adam_kernel.py                                    48      3    94%
tests/test_optimizer/test_hybrid_adam.py                                          31      1    97%
tests/test_tensor/model/test_gpt2.py                                             101     12    88%
tests/test_tensor/model/test_model.py                                            232     81    65%
tests/test_tensor/model/test_module_spec.py                                      159    123    23%
tests/test_tensor/test_context.py                                                 40      1    98%
tests/test_tensor/test_tp_with_zero.py                                           102      1    99%
tests/test_utils/test_colo_checkpoint.py                                         148     27    82%
tests/test_utils/test_commons.py                                                  30      1    97%
tests/test_utils/test_zero_gradient_clippling.py                                  86     52    40%
tests/test_zero/common.py                                                         83     24    71%
tests/test_zero/low_level_zero/test_zero_init.py                                  44      4    91%
tests/test_zero/low_level_zero/test_zero_tp.py                                    70      1    99%
tests/test_zero/test_found_inf.py                                                 50      1    98%
tests/test_zero/test_init_context.py                                              53      2    96%
tests/test_zero/test_shard_model_v2.py                                            48      1    98%
tests/test_zero/test_shard_param.py                                               67      1    99%
tests/test_zero/test_sharded_optim_state_dict.py                                  68      1    99%
tests/test_zero/test_sharded_optim_v2.py                                          82      1    99%
tests/test_zero/test_sharded_optim_with_sync_bn.py                                46      1    98%
tests/test_zero/test_state_dict.py                                                41      1    98%
tests/test_zero/test_tensor_utils.py                                              67      7    90%
tests/test_zero/test_zero_engine.py                                               73      5    93%
--------------------------------------------------------------------------------------------------
TOTAL                                                                           9076   1923    79%

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2023

The code coverage for the changed files is 79%.

Click me to view the complete report
Name                                                                           Stmts   Miss  Cover
--------------------------------------------------------------------------------------------------
colossalai/auto_parallel/offload/base_offload_module.py                           71     47    34%
colossalai/auto_parallel/offload/region.py                                        81     64    21%
colossalai/booster/plugin/gemini_plugin.py                                       116     16    86%
colossalai/engine/_base_engine.py                                                 89     14    84%
colossalai/engine/schedule/_pipeline_schedule.py                                 430    261    39%
colossalai/initialize.py                                                         188     57    70%
colossalai/nn/layer/moe/experts.py                                               127     64    50%
colossalai/nn/layer/moe/layers.py                                                103     18    83%
colossalai/nn/parallel/__init__.py                                                 2      0   100%
colossalai/nn/parallel/data_parallel.py                                           99     26    74%
colossalai/zero/__init__.py                                                        4      0   100%
colossalai/zero/gemini/__init__.py                                                 7      0   100%
colossalai/zero/gemini/chunk/__init__.py                                           5      0   100%
colossalai/zero/gemini/chunk/chunk.py                                            314     46    85%
colossalai/zero/gemini/chunk/manager.py                                          131     16    88%
colossalai/zero/gemini/chunk/search_utils.py                                      90      1    99%
colossalai/zero/gemini/chunk/utils.py                                             31      0   100%
colossalai/zero/gemini/colo_init_context.py                                      100     25    75%
colossalai/zero/gemini/gemini_ddp.py                                             302     58    81%
colossalai/zero/gemini/gemini_hook.py                                             50      2    96%
colossalai/zero/gemini/gemini_mgr.py                                             101      7    93%
colossalai/zero/gemini/gemini_optimizer.py                                       205     39    81%
colossalai/zero/gemini/memory_tracer/__init__.py                                   7      0   100%
colossalai/zero/gemini/memory_tracer/chunk_memstats_collector.py                  17      1    94%
colossalai/zero/gemini/memory_tracer/memory_monitor.py                            71     34    52%
colossalai/zero/gemini/memory_tracer/memory_stats.py                              74     12    84%
colossalai/zero/gemini/memory_tracer/memstats_collector.py                        52      5    90%
colossalai/zero/gemini/memory_tracer/param_runtime_order.py                       25      4    84%
colossalai/zero/gemini/memory_tracer/runtime_mem_tracer.py                        64      0   100%
colossalai/zero/gemini/memory_tracer/static_memstats_collector.py                 72     55    24%
colossalai/zero/gemini/memory_tracer/utils.py                                     34     20    41%
colossalai/zero/gemini/placement_policy.py                                       144     51    65%
colossalai/zero/gemini/utils.py                                                   58      2    97%
colossalai/zero/legacy/__init__.py                                                19      1    95%
colossalai/zero/legacy/gemini/__init__.py                                          5      0   100%
colossalai/zero/legacy/gemini/gemini_context.py                                   29      3    90%
colossalai/zero/legacy/gemini/ophooks/__init__.py                                  2      0   100%
colossalai/zero/legacy/gemini/ophooks/_shard_grad_ophook.py                       19     19     0%
colossalai/zero/legacy/gemini/ophooks/_shard_param_ophook.py                      33     33     0%
colossalai/zero/legacy/gemini/ophooks/runtime_mem_tracer_hook.py                  94     12    87%
colossalai/zero/legacy/gemini/ophooks/utils.py                                    90     17    81%
colossalai/zero/legacy/gemini/paramhooks/__init__.py                               2      0   100%
colossalai/zero/legacy/gemini/paramhooks/_param_hookmgr.py                        18      1    94%
colossalai/zero/legacy/gemini/stateful_tensor.py                                 123      7    94%
colossalai/zero/legacy/gemini/stateful_tensor_mgr.py                              69      2    97%
colossalai/zero/legacy/gemini/tensor_placement_policy.py                          82     20    76%
colossalai/zero/legacy/gemini/tensor_utils.py                                     54      4    93%
colossalai/zero/legacy/init_ctx/__init__.py                                        2      0   100%
colossalai/zero/legacy/init_ctx/init_context.py                                  144      4    97%
colossalai/zero/legacy/shard_utils/__init__.py                                     4      0   100%
colossalai/zero/legacy/shard_utils/base_shard_strategy.py                         13      2    85%
colossalai/zero/legacy/shard_utils/bucket_tensor_shard_strategy.py                32      1    97%
colossalai/zero/legacy/shard_utils/commons.py                                     13      0   100%
colossalai/zero/legacy/shard_utils/tensor_shard_strategy.py                       38      2    95%
colossalai/zero/legacy/sharded_model/__init__.py                                   2      0   100%
colossalai/zero/legacy/sharded_model/_utils.py                                    51     23    55%
colossalai/zero/legacy/sharded_model/reduce_scatter.py                            94     50    47%
colossalai/zero/legacy/sharded_model/sharded_model_v2.py                         297    108    64%
colossalai/zero/legacy/sharded_model/utils.py                                     12      0   100%
colossalai/zero/legacy/sharded_model/zero_hook.py                                 73     11    85%
colossalai/zero/legacy/sharded_optim/__init__.py                                   2      0   100%
colossalai/zero/legacy/sharded_optim/sharded_optim_v2.py                         204     32    84%
colossalai/zero/legacy/sharded_param/__init__.py                                   3      0   100%
colossalai/zero/legacy/sharded_param/sharded_param.py                             68      3    96%
colossalai/zero/legacy/sharded_param/sharded_tensor.py                            26      0   100%
colossalai/zero/low_level/__init__.py                                              2      0   100%
colossalai/zero/low_level/_utils.py                                              125     48    62%
colossalai/zero/low_level/bookkeeping/__init__.py                                  5      0   100%
colossalai/zero/low_level/bookkeeping/base_store.py                               12      2    83%
colossalai/zero/low_level/bookkeeping/bucket_store.py                             28      0   100%
colossalai/zero/low_level/bookkeeping/gradient_store.py                           24      3    88%
colossalai/zero/low_level/bookkeeping/parameter_store.py                          48      0   100%
colossalai/zero/low_level/bookkeeping/tensor_bucket.py                            37      7    81%
colossalai/zero/low_level/low_level_optim.py                                     311     25    92%
colossalai/zero/wrapper.py                                                        36      7    81%
tests/test_auto_parallel/test_offload/test_perf.py                               111     85    23%
tests/test_auto_parallel/test_tensor_shard/test_compatibility_with_gemini.py      81     56    31%
tests/test_ddp/test_ddp_ignore_params.py                                          73      1    99%
tests/test_ddp/test_ddp_state_dict.py                                             52      2    96%
tests/test_gemini/test_gemini_manager.py                                          54      1    98%
tests/test_gemini/test_param_op.py                                                52      5    90%
tests/test_gemini/test_runtime_mem_tracer.py                                      38      1    97%
tests/test_gemini/update/test_chunk_mgrv2.py                                      52      1    98%
tests/test_gemini/update/test_chunkv2.py                                          90      1    99%
tests/test_gemini/update/test_fwd_bwd.py                                          81      1    99%
tests/test_gemini/update/test_gemini_use_rmt.py                                   77      1    99%
tests/test_gemini/update/test_get_torch_model.py                                  43      1    98%
tests/test_gemini/update/test_grad_clip.py                                        86      1    99%
tests/test_gemini/update/test_inference.py                                       103      1    99%
tests/test_gemini/update/test_optim.py                                           123      1    99%
tests/test_gemini/update/test_search.py                                           74      1    99%
tests/test_gemini/update/test_zeroddp_state_dict.py                               87      5    94%
tests/test_gemini/update/test_zerooptim_state_dict.py                             71      1    99%
tests/test_moe/test_moe_checkpoint.py                                             41      1    98%
tests/test_moe/test_moe_colo_init.py                                              43      2    95%
tests/test_moe/test_moe_zero_init.py                                              72      2    97%
tests/test_moe/test_moe_zero_model.py                                             55      1    98%
tests/test_moe/test_moe_zero_optim.py                                             85      3    96%
tests/test_optimizer/test_cpu_adam.py                                             54      4    93%
tests/test_optimizer/test_fused_adam_kernel.py                                    48      3    94%
tests/test_optimizer/test_hybrid_adam.py                                          31      1    97%
tests/test_tensor/model/test_gpt2.py                                             101     12    88%
tests/test_tensor/model/test_model.py                                            232     81    65%
tests/test_tensor/model/test_module_spec.py                                      159    123    23%
tests/test_tensor/test_context.py                                                 40      1    98%
tests/test_tensor/test_tp_with_zero.py                                           102      1    99%
tests/test_utils/test_colo_checkpoint.py                                         148     27    82%
tests/test_utils/test_commons.py                                                  30      1    97%
tests/test_utils/test_zero_gradient_clippling.py                                  86     52    40%
tests/test_zero/common.py                                                         83     24    71%
tests/test_zero/low_level_zero/test_zero_init.py                                  44      4    91%
tests/test_zero/low_level_zero/test_zero_tp.py                                    70      1    99%
tests/test_zero/test_found_inf.py                                                 50      1    98%
tests/test_zero/test_init_context.py                                              53      2    96%
tests/test_zero/test_shard_model_v2.py                                            48      1    98%
tests/test_zero/test_shard_param.py                                               67      1    99%
tests/test_zero/test_sharded_optim_state_dict.py                                  68      1    99%
tests/test_zero/test_sharded_optim_v2.py                                          82      1    99%
tests/test_zero/test_sharded_optim_with_sync_bn.py                                46      1    98%
tests/test_zero/test_state_dict.py                                                41      1    98%
tests/test_zero/test_tensor_utils.py                                              67      7    90%
tests/test_zero/test_zero_engine.py                                               73      5    93%
--------------------------------------------------------------------------------------------------
TOTAL                                                                           9076   1923    79%

@FrankLeeeee FrankLeeeee merged commit 26b7aac into hpcaitech:main Apr 4, 2023
@ver217 ver217 deleted the refactor/zero branch April 4, 2023 05:48
yhna940 added a commit to EleutherAI/oslo that referenced this pull request Apr 19, 2023
…, and In-Place Dist Tensor Conversion (#178)

## Title

- Improve Zero3 Implementation: Search Utility, Consolidation, and
In-Place Dist Tensor Conversion

## Description

This PR aims to improve the zero3 implementation with the following
major changes:

1. Added a search utility for configuring chunk structures.
2. Consolidated zero-related implementations into a single directory
(Motivated by this
[commit](hpcaitech/ColossalAI#3424)).
3. Added a process for converting to custom tensors in-place (Motivated
by this [commit](hpcaitech/ColossalAI#3379)).
4. Unittest

Minor changes include:

1. Instantiation of chunk manager and hetero memory manager within fsdp.
2. Several small bug fixes.

## Linked Issues

- N/A

---------

Co-authored-by: Junhwa Song <ethan9867@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gemini related to the gemini feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[zero] reorganize zero/gemini folder structure

2 participants