Skip to content

[moe] support hybrid parallel#4748

Merged
oahzxl merged 16 commits intohpcaitech:feature/MoEfrom
oahzxl:pp
Sep 21, 2023
Merged

[moe] support hybrid parallel#4748
oahzxl merged 16 commits intohpcaitech:feature/MoEfrom
oahzxl:pp

Conversation

@oahzxl
Copy link
Copy Markdown
Contributor

@oahzxl oahzxl commented Sep 18, 2023

support hybrid parallel with pp + dp + ep

  • add custom shard policy for openmoe
  • add moe hybrid plugin which modifies some init settings of original hybrid plguin
  • updata parallel process group mesh in moe
  • use layernorm kernel and jit fuse in shardformer

@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 38%.

Click me to view the complete report
Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
colossalai/booster/plugin/hybrid_parallel_plugin.py                    214     15    93%
colossalai/booster/plugin/moe_hybrid_parallel_plugin.py                110    110     0%
colossalai/context/__init__.py                                           6      0   100%
colossalai/context/random/__init__.py                                    2      0   100%
colossalai/context/random/_helper.py                                    46      9    80%
colossalai/initialize.py                                               180    133    26%
colossalai/kernel/triton/__init__.py                                    14      3    79%
colossalai/kernel/triton/llama_act_combine_kernel.py                    89     67    25%
colossalai/legacy/engine/gradient_handler/__init__.py                    6      0   100%
colossalai/legacy/engine/gradient_handler/_moe_gradient_handler.py      20     20     0%
colossalai/moe/__init__.py                                               6      6     0%
colossalai/moe/_operation.py                                           133    133     0%
colossalai/moe/checkpoint.py                                            56     56     0%
colossalai/moe/experts.py                                               90     90     0%
colossalai/moe/layers.py                                                75     75     0%
colossalai/moe/loss.py                                                  21     21     0%
colossalai/moe/manager.py                                               78     78     0%
colossalai/moe/routers.py                                              176    176     0%
colossalai/moe/utils.py                                                 81     81     0%
colossalai/nn/layer/moe/__init__.py                                     12      0   100%
colossalai/nn/loss/__init__.py                                           0      0   100%
colossalai/tensor/moe_tensor/__init__.py                                 0      0   100%
colossalai/tensor/moe_tensor/api.py                                     24      9    62%
colossalai/tensor/moe_tensor/moe_info.py                                13     10    23%
colossalai/zero/low_level/low_level_optim.py                           355     34    90%
tests/test_infer_ops/triton/test_llama_act_combine.py                   40     23    42%
----------------------------------------------------------------------------------------
TOTAL                                                                 1847   1149    38%

@oahzxl oahzxl requested a review from ver217 September 19, 2023 03:05
Comment thread colossalai/booster/plugin/moe_hybrid_parallel_plugin.py Outdated
Comment thread examples/language/openmoe/test_ci.sh
@oahzxl oahzxl merged commit 1a9a889 into hpcaitech:feature/MoE Sep 21, 2023
@oahzxl oahzxl deleted the pp branch September 25, 2023 02:55
oahzxl added a commit to oahzxl/ColossalAI that referenced this pull request Oct 26, 2023
* init policy

* renam,e

* update pp

* finish pp

* update script

* update plugin

* finish pp

* update setup for different plugin

* update ci

* update ci

* update ci

* support ep inside or dp inside

* update arg for kernel

* disable ci

* update train script

* update plugin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants