Skip to content

[moe]: add overlap ep, and fix overlap tp#4925

Merged
oahzxl merged 10 commits intohpcaitech:feature/MoEfrom
cwher:talk-and-work
Oct 18, 2023
Merged

[moe]: add overlap ep, and fix overlap tp#4925
oahzxl merged 10 commits intohpcaitech:feature/MoEfrom
cwher:talk-and-work

Conversation

@cwher
Copy link
Copy Markdown
Contributor

@cwher cwher commented Oct 16, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@oahzxl
Copy link
Copy Markdown
Contributor

oahzxl commented Oct 17, 2023

Can you show the profile image of ep overlap?

@cwher
Copy link
Copy Markdown
Contributor Author

cwher commented Oct 17, 2023

Can you show the profile image of ep overlap?

image

@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 73%.

Click me to view the complete report
Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
colossalai/booster/plugin/hybrid_parallel_plugin.py                    214     13    94%
colossalai/booster/plugin/moe_hybrid_parallel_plugin.py                102     32    69%
colossalai/context/__init__.py                                           6      0   100%
colossalai/context/random/__init__.py                                    2      0   100%
colossalai/context/random/_helper.py                                    46      9    80%
colossalai/initialize.py                                               180    133    26%
colossalai/kernel/triton/__init__.py                                    14      3    79%
colossalai/kernel/triton/llama_act_combine_kernel.py                    89     67    25%
colossalai/legacy/engine/gradient_handler/__init__.py                    6      0   100%
colossalai/legacy/engine/gradient_handler/_moe_gradient_handler.py      20     20     0%
colossalai/moe/__init__.py                                               6      0   100%
colossalai/moe/_operation.py                                           168     41    76%
colossalai/moe/checkpoint.py                                           133     31    77%
colossalai/moe/experts.py                                               97     16    84%
colossalai/moe/layers.py                                               167      5    97%
colossalai/moe/load_balance.py                                         211     17    92%
colossalai/moe/loss.py                                                  21     21     0%
colossalai/moe/manager.py                                               91      7    92%
colossalai/moe/routers.py                                              176     15    91%
colossalai/moe/utils.py                                                 79     26    67%
colossalai/nn/layer/moe/__init__.py                                     12      0   100%
colossalai/nn/loss/__init__.py                                           0      0   100%
colossalai/tensor/moe_tensor/__init__.py                                 0      0   100%
colossalai/tensor/moe_tensor/api.py                                     24      2    92%
colossalai/tensor/moe_tensor/moe_info.py                                13      0   100%
colossalai/zero/low_level/low_level_optim.py                           431     32    93%
examples/language/openmoe/model/__init__.py                              0      0   100%
examples/language/openmoe/model/modeling_openmoe.py                    462    328    29%
examples/language/openmoe/model/openmoe_policy.py                      248    157    37%
tests/test_infer_ops/triton/test_llama_act_combine.py                   40     23    42%
tests/test_moe/moe_utils.py                                             97      0   100%
tests/test_moe/test_grad_handler.py                                     59      1    98%
tests/test_moe/test_kernel.py                                           57      1    98%
tests/test_moe/test_moe_checkpoint.py                                   69      2    97%
tests/test_moe/test_moe_ep_tp.py                                        63      1    98%
tests/test_moe/test_moe_group.py                                        54      2    96%
tests/test_moe/test_moe_hybrid_zero.py                                  66      2    97%
tests/test_moe/test_moe_load_balance.py                                124      9    93%
tests/test_moe/test_moe_router.py                                       25      1    96%
tests/test_moe/test_moe_zero_fwd_bwd.py                                 77      3    96%
tests/test_moe/test_moe_zero_optim.py                                   66      9    86%
----------------------------------------------------------------------------------------
TOTAL                                                                 3815   1029    73%

@oahzxl oahzxl merged commit b698c5a into hpcaitech:feature/MoE Oct 18, 2023
oahzxl pushed a commit to oahzxl/ColossalAI that referenced this pull request Oct 26, 2023
* test: add more ep/tp test case

* to: add TPOverlap fn

* fix: fix tp overlap

* fix: remove useless variables

* feat: add async all to all

* feat: add overlap ep

* fix: fix import error

* fix: fix ep/tp tests

* perf: optimize overlap

* fix: add world_size check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants