Skip to content

[moe] speed up embed and mlp#4701

Merged
oahzxl merged 9 commits intohpcaitech:feature/moefrom
oahzxl:mlp_kernel
Sep 14, 2023
Merged

[moe] speed up embed and mlp#4701
oahzxl merged 9 commits intohpcaitech:feature/moefrom
oahzxl:mlp_kernel

Conversation

@oahzxl
Copy link
Copy Markdown
Contributor

@oahzxl oahzxl commented Sep 13, 2023

  • store rotary embedding in llama, speed up ~10%
  • use act combine kernel in mlp activation, speed up ~5%

@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 4%.

Click me to view the complete report
Name                                                      Stmts   Miss  Cover
-----------------------------------------------------------------------------
colossalai/amp/naive_amp/mixed_precision_optimizer.py        98     98     0%
colossalai/booster/booster.py                                66     66     0%
colossalai/booster/plugin/__init__.py                        11     11     0%
colossalai/booster/plugin/hybrid_parallel_plugin.py         152    152     0%
colossalai/booster/plugin/pp_plugin_base.py                   9      9     0%
colossalai/cluster/__init__.py                                5      0   100%
colossalai/cluster/process_group_mesh.py                     72     46    36%
colossalai/context/__init__.py                                6      0   100%
colossalai/context/random/__init__.py                         2      0   100%
colossalai/context/random/_helper.py                         46     28    39%
colossalai/engine/gradient_handler/__init__.py                6      0   100%
colossalai/initialize.py                                    180    149    17%
colossalai/interface/optimizer.py                            45     20    56%
colossalai/kernel/cuda_native/__init__.py                     5      0   100%
colossalai/kernel/triton/__init__.py                          0      0   100%
colossalai/kernel/triton/llama_act_combine_kernel.py         90     68    24%
colossalai/lazy/lazy_init.py                                315    246    22%
colossalai/moe/__init__.py                                    6      6     0%
colossalai/moe/_operation.py                                143    143     0%
colossalai/moe/checkpoint.py                                 56     56     0%
colossalai/moe/experts.py                                    84     84     0%
colossalai/moe/layers.py                                     75     75     0%
colossalai/moe/loss.py                                       21     21     0%
colossalai/moe/manager.py                                    60     60     0%
colossalai/moe/routers.py                                   175    175     0%
colossalai/moe/utils.py                                      81     81     0%
colossalai/nn/layer/__init__.py                              10      0   100%
colossalai/nn/layer/moe/__init__.py                          12      0   100%
colossalai/nn/loss/__init__.py                               23      9    61%
colossalai/pipeline/p2p.py                                  102    102     0%
colossalai/pipeline/schedule/__init__.py                      3      3     0%
colossalai/pipeline/schedule/_utils.py                       50     50     0%
colossalai/pipeline/schedule/base.py                         10     10     0%
colossalai/pipeline/schedule/one_f_one_b.py                 116    116     0%
colossalai/pipeline/stage_manager.py                         68     68     0%
colossalai/shardformer/_utils.py                             54     54     0%
colossalai/shardformer/layer/__init__.py                      8      8     0%
colossalai/shardformer/layer/embedding.py                   130    130     0%
colossalai/shardformer/layer/linear.py                      181    181     0%
colossalai/shardformer/layer/normalization.py                51     51     0%
colossalai/shardformer/layer/qkv_fused_linear.py            292    292     0%
colossalai/shardformer/layer/utils.py                        84     84     0%
colossalai/shardformer/modeling/bert.py                     431    431     0%
colossalai/shardformer/modeling/blip2.py                     53     53     0%
colossalai/shardformer/modeling/bloom.py                    387    387     0%
colossalai/shardformer/modeling/chatglm.py                  149    149     0%
colossalai/shardformer/modeling/gpt2.py                     293    293     0%
colossalai/shardformer/modeling/jit.py                       19     19     0%
colossalai/shardformer/modeling/llama.py                    204    204     0%
colossalai/shardformer/modeling/opt.py                      285    285     0%
colossalai/shardformer/modeling/sam.py                       94     94     0%
colossalai/shardformer/modeling/t5.py                       297    297     0%
colossalai/shardformer/modeling/vit.py                      149    149     0%
colossalai/shardformer/modeling/whisper.py                   95     95     0%
colossalai/shardformer/policies/auto_policy.py               27     27     0%
colossalai/shardformer/policies/base_policy.py               87     87     0%
colossalai/shardformer/policies/bert.py                     257    257     0%
colossalai/shardformer/policies/blip2.py                     54     54     0%
colossalai/shardformer/policies/bloom.py                    151    151     0%
colossalai/shardformer/policies/chatglm.py                  100    100     0%
colossalai/shardformer/policies/gpt2.py                     181    181     0%
colossalai/shardformer/policies/llama.py                    114    114     0%
colossalai/shardformer/policies/opt.py                      140    140     0%
colossalai/shardformer/policies/sam.py                       32     32     0%
colossalai/shardformer/policies/t5.py                       182    182     0%
colossalai/shardformer/policies/vit.py                      108    108     0%
colossalai/shardformer/policies/whisper.py                   61     61     0%
colossalai/shardformer/shard/shard_config.py                 28     28     0%
colossalai/shardformer/shard/sharder.py                      95     95     0%
colossalai/shardformer/shard/shardformer.py                  15     15     0%
colossalai/shardformer/shard/utils.py                        11     11     0%
colossalai/tensor/d_tensor/api.py                           149    113    24%
colossalai/tensor/moe_tensor/__init__.py                      0      0   100%
colossalai/tensor/moe_tensor/api.py                          24     10    58%
colossalai/tensor/moe_tensor/moe_info.py                     12      9    25%
colossalai/zero/low_level/low_level_optim.py                330    288    13%
tests/kit/model_zoo/transformers/__init__.py                 12     12     0%
tests/kit/model_zoo/transformers/bert.py                     50     50     0%
tests/kit/model_zoo/transformers/blip2.py                    21     21     0%
tests/kit/model_zoo/transformers/bloom.py                    36     36     0%
tests/kit/model_zoo/transformers/chatglm.py                  20     20     0%
tests/kit/model_zoo/transformers/gpt.py                      39     39     0%
tests/kit/model_zoo/transformers/opt.py                      32     32     0%
tests/kit/model_zoo/transformers/sam.py                      14     14     0%
tests/kit/model_zoo/transformers/t5.py                       25     25     0%
tests/kit/model_zoo/transformers/vit.py                      24     24     0%
tests/kit/model_zoo/transformers/whisper.py                  23     23     0%
tests/test_kernels/test_llama_act_combine.py                 38     22    42%
tests/test_shardformer/test_model/_utils.py                 142    142     0%
tests/test_shardformer/test_model/test_shard_bert.py         62     62     0%
tests/test_shardformer/test_model/test_shard_blip2.py        40     40     0%
tests/test_shardformer/test_model/test_shard_bloom.py        59     59     0%
tests/test_shardformer/test_model/test_shard_chatglm.py      60     60     0%
tests/test_shardformer/test_model/test_shard_gpt2.py         65     65     0%
tests/test_shardformer/test_model/test_shard_llama.py        62     62     0%
tests/test_shardformer/test_model/test_shard_opt.py          62     62     0%
tests/test_shardformer/test_model/test_shard_sam.py          39     39     0%
tests/test_shardformer/test_model/test_shard_t5.py           59     59     0%
tests/test_shardformer/test_model/test_shard_vit.py          61     61     0%
tests/test_shardformer/test_model/test_shard_whisper.py      46     46     0%
tests/test_shardformer/test_shard_utils.py                   21     21     0%
tests/test_shardformer/test_with_torch_ddp.py                52     52     0%
-----------------------------------------------------------------------------
TOTAL                                                      8781   8419     4%

@oahzxl oahzxl merged commit 265a2ff into hpcaitech:feature/moe Sep 14, 2023
oahzxl added a commit to oahzxl/ColossalAI that referenced this pull request Sep 15, 2023
* update triton

* update kernel

* add init

* add version check

* update precision

* update precision

* update kernel in experts

* update test arg

* update settings
oahzxl added a commit to oahzxl/ColossalAI that referenced this pull request Sep 15, 2023
* update triton

* update kernel

* add init

* add version check

* update precision

* update precision

* update kernel in experts

* update test arg

* update settings
@oahzxl oahzxl deleted the mlp_kernel branch September 15, 2023 07:42
oahzxl added a commit to oahzxl/ColossalAI that referenced this pull request Oct 26, 2023
* update triton

* update kernel

* add init

* add version check

* update precision

* update precision

* update kernel in experts

* update test arg

* update settings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants