Skip to content

[moe] support load balance#4914

Merged
oahzxl merged 10 commits intohpcaitech:feature/MoEfrom
oahzxl:load
Oct 16, 2023
Merged

[moe] support load balance#4914
oahzxl merged 10 commits intohpcaitech:feature/MoEfrom
oahzxl:load

Conversation

@oahzxl
Copy link
Copy Markdown
Contributor

@oahzxl oahzxl commented Oct 15, 2023

  • support load balance for moe
  • add a unit test

@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 72%.

Click me to view the complete report
Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
colossalai/booster/plugin/hybrid_parallel_plugin.py                    214     13    94%
colossalai/booster/plugin/moe_hybrid_parallel_plugin.py                102     32    69%
colossalai/context/__init__.py                                           6      0   100%
colossalai/context/random/__init__.py                                    2      0   100%
colossalai/context/random/_helper.py                                    46      9    80%
colossalai/initialize.py                                               180    133    26%
colossalai/kernel/triton/__init__.py                                    14      3    79%
colossalai/kernel/triton/llama_act_combine_kernel.py                    89     67    25%
colossalai/legacy/engine/gradient_handler/__init__.py                    6      0   100%
colossalai/legacy/engine/gradient_handler/_moe_gradient_handler.py      20     20     0%
colossalai/moe/__init__.py                                               6      0   100%
colossalai/moe/_operation.py                                           159     44    72%
colossalai/moe/checkpoint.py                                           133     31    77%
colossalai/moe/experts.py                                               97     16    84%
colossalai/moe/layers.py                                               136     34    75%
colossalai/moe/load_balance.py                                         201     13    94%
colossalai/moe/loss.py                                                  21     21     0%
colossalai/moe/manager.py                                               91      7    92%
colossalai/moe/routers.py                                              176     15    91%
colossalai/moe/utils.py                                                 79     26    67%
colossalai/nn/layer/moe/__init__.py                                     12      0   100%
colossalai/nn/loss/__init__.py                                           0      0   100%
colossalai/tensor/moe_tensor/__init__.py                                 0      0   100%
colossalai/tensor/moe_tensor/api.py                                     24      2    92%
colossalai/tensor/moe_tensor/moe_info.py                                13      0   100%
colossalai/zero/low_level/low_level_optim.py                           431     32    93%
examples/language/openmoe/model/__init__.py                              0      0   100%
examples/language/openmoe/model/modeling_openmoe.py                    462    328    29%
examples/language/openmoe/model/openmoe_policy.py                      248    157    37%
tests/test_infer_ops/triton/test_llama_act_combine.py                   40     23    42%
tests/test_moe/moe_utils.py                                             97      0   100%
tests/test_moe/test_grad_handler.py                                     59      1    98%
tests/test_moe/test_kernel.py                                           57      1    98%
tests/test_moe/test_moe_checkpoint.py                                   69      2    97%
tests/test_moe/test_moe_ep_tp.py                                        48      1    98%
tests/test_moe/test_moe_group.py                                        54      2    96%
tests/test_moe/test_moe_hybrid_zero.py                                  66      2    97%
tests/test_moe/test_moe_load_balance.py                                124      9    93%
tests/test_moe/test_moe_local.py                                        49      1    98%
tests/test_moe/test_moe_router.py                                       25      1    96%
tests/test_moe/test_moe_zero_fwd_bwd.py                                 77      3    96%
tests/test_moe/test_moe_zero_optim.py                                   66      9    86%
----------------------------------------------------------------------------------------
TOTAL                                                                 3799   1058    72%

@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 54%.

Click me to view the complete report
Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
colossalai/booster/plugin/hybrid_parallel_plugin.py                    214    157    27%
colossalai/booster/plugin/moe_hybrid_parallel_plugin.py                102    102     0%
colossalai/context/__init__.py                                           6      0   100%
colossalai/context/random/__init__.py                                    2      0   100%
colossalai/context/random/_helper.py                                    46     14    70%
colossalai/initialize.py                                               180    134    26%
colossalai/kernel/triton/__init__.py                                    14      3    79%
colossalai/kernel/triton/llama_act_combine_kernel.py                    89     67    25%
colossalai/legacy/engine/gradient_handler/__init__.py                    6      0   100%
colossalai/legacy/engine/gradient_handler/_moe_gradient_handler.py      20     20     0%
colossalai/moe/__init__.py                                               6      0   100%
colossalai/moe/_operation.py                                           159     97    39%
colossalai/moe/checkpoint.py                                           133    106    20%
colossalai/moe/experts.py                                               97     20    79%
colossalai/moe/layers.py                                               136     44    68%
colossalai/moe/load_balance.py                                         211      7    97%
colossalai/moe/loss.py                                                  21     21     0%
colossalai/moe/manager.py                                               91     13    86%
colossalai/moe/routers.py                                              176     82    53%
colossalai/moe/utils.py                                                 79     41    48%
colossalai/nn/layer/moe/__init__.py                                     12      0   100%
colossalai/nn/loss/__init__.py                                           0      0   100%
colossalai/tensor/moe_tensor/__init__.py                                 0      0   100%
colossalai/tensor/moe_tensor/api.py                                     24      4    83%
colossalai/tensor/moe_tensor/moe_info.py                                13      0   100%
colossalai/zero/low_level/low_level_optim.py                           431    150    65%
tests/test_moe/moe_utils.py                                             97     53    45%
tests/test_moe/test_moe_load_balance.py                                124      9    93%
----------------------------------------------------------------------------------------
TOTAL                                                                 2489   1144    54%

Comment thread colossalai/moe/layers.py Outdated
Comment thread colossalai/moe/load_balance.py
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 72%.

Click me to view the complete report
Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
colossalai/booster/plugin/hybrid_parallel_plugin.py                    214     13    94%
colossalai/booster/plugin/moe_hybrid_parallel_plugin.py                102     32    69%
colossalai/context/__init__.py                                           6      0   100%
colossalai/context/random/__init__.py                                    2      0   100%
colossalai/context/random/_helper.py                                    46      9    80%
colossalai/initialize.py                                               180    133    26%
colossalai/kernel/triton/__init__.py                                    14      3    79%
colossalai/kernel/triton/llama_act_combine_kernel.py                    89     67    25%
colossalai/legacy/engine/gradient_handler/__init__.py                    6      0   100%
colossalai/legacy/engine/gradient_handler/_moe_gradient_handler.py      20     20     0%
colossalai/moe/__init__.py                                               6      0   100%
colossalai/moe/_operation.py                                           159     44    72%
colossalai/moe/checkpoint.py                                           133     31    77%
colossalai/moe/experts.py                                               97     16    84%
colossalai/moe/layers.py                                               135     34    75%
colossalai/moe/load_balance.py                                         211     17    92%
colossalai/moe/loss.py                                                  21     21     0%
colossalai/moe/manager.py                                               91      7    92%
colossalai/moe/routers.py                                              176     15    91%
colossalai/moe/utils.py                                                 79     26    67%
colossalai/nn/layer/moe/__init__.py                                     12      0   100%
colossalai/nn/loss/__init__.py                                           0      0   100%
colossalai/tensor/moe_tensor/__init__.py                                 0      0   100%
colossalai/tensor/moe_tensor/api.py                                     24      2    92%
colossalai/tensor/moe_tensor/moe_info.py                                13      0   100%
colossalai/zero/low_level/low_level_optim.py                           431     32    93%
examples/language/openmoe/model/__init__.py                              0      0   100%
examples/language/openmoe/model/modeling_openmoe.py                    462    328    29%
examples/language/openmoe/model/openmoe_policy.py                      248    157    37%
tests/test_infer_ops/triton/test_llama_act_combine.py                   40     23    42%
tests/test_moe/moe_utils.py                                             97      0   100%
tests/test_moe/test_grad_handler.py                                     59      1    98%
tests/test_moe/test_kernel.py                                           57      1    98%
tests/test_moe/test_moe_checkpoint.py                                   69      2    97%
tests/test_moe/test_moe_ep_tp.py                                        48      1    98%
tests/test_moe/test_moe_group.py                                        54      2    96%
tests/test_moe/test_moe_hybrid_zero.py                                  66      2    97%
tests/test_moe/test_moe_load_balance.py                                124      9    93%
tests/test_moe/test_moe_local.py                                        49      1    98%
tests/test_moe/test_moe_router.py                                       25      1    96%
tests/test_moe/test_moe_zero_fwd_bwd.py                                 77      3    96%
tests/test_moe/test_moe_zero_optim.py                                   66      9    86%
----------------------------------------------------------------------------------------
TOTAL                                                                 3808   1062    72%

@oahzxl oahzxl merged commit 926ad74 into hpcaitech:feature/MoE Oct 16, 2023
@oahzxl oahzxl deleted the load branch October 18, 2023 02:21
oahzxl added a commit to oahzxl/ColossalAI that referenced this pull request Oct 26, 2023
* add load balance

* update test

* update param exchange

* pass test

* update test

* update test

* update test

* update test

* fix ranks

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants