From 914f205eb9dfe7e7b50e98c393445f87250873de Mon Sep 17 00:00:00 2001
From: ver217 <lhx0217@gmail.com>
Date: Wed, 17 May 2023 15:20:03 +0800
Subject: [PATCH 1/6] [doc] add en booster plugins doc

---
 docs/source/en/basics/booster_plugins.md | 64 ++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 docs/source/en/basics/booster_plugins.md

diff --git a/docs/source/en/basics/booster_plugins.md b/docs/source/en/basics/booster_plugins.md
new file mode 100644
index 000000000000..c15c30c8450c
--- /dev/null
+++ b/docs/source/en/basics/booster_plugins.md
@@ -0,0 +1,64 @@
+# Booster Plugins
+
+Author: [Hongxin Liu](https://github.com/ver217)
+
+**Prerequisite:**
+- [Booster API](./booster_api.md)
+
+## Introduction
+
+As mentioned in [Booster API](./booster_api.md), we can use booster plugins to customize the parallel training. In this tutorial, we will introduce how to use booster plugins.
+
+We currently provide the following plugins:
+
+- [Low Level Zero Plugin](#low-level-zero-plugin): It wraps the `colossalai.zero.low_level.LowLevelZeroOptimizer` and can be used to train models with zero-dp. It only supports zero stage-1 and stage-2.
+- [Gemini Plugin](#gemini-plugin): It wraps the [Gemini](../features/zero_with_chunk.md) which implements Zero-3 with chunk-based and heterogeneous memory management.
+- [Torch DDP Plugin](#torch-ddp-plugin): It is a wrapper of `torch.nn.parallel.DistributedDataParallel` and can be used to train models with data parallelism.
+- [Torch FSDP Plugin](#torch-fsdp-plugin): It is a wrapper of `torch.distributed.fsdp.FullyShardedDataParallel` and can be used to train models with zero-dp.
+
+More plugins are coming soon.
+
+## Plugins
+
+### Low Level Zero Plugin
+
+This plugin implements Zero-1 and Zero-2 (w/wo CPU offload), using `reduce` and `gather` to synchronize gradients and weights.
+
+Zero-1 can be regarded as a better substitute of Torch DDP, which is more memory efficient and faster. It can be easily used in hybrid parallelism.
+
+Zero-2 does not support local gradient accumulation. Though you can accumulate gradient if you insist, it cannot reduce communication cost. That is to say, it's not a good idea to use Zero-2 with pipeline parallelism.
+
+{{ autodoc:colossalai.booster.plugin.LowLevelZeroPlugin }}
+
+We've tested compatibility on some famous models, following models may not be supported:
+
+- `timm.models.convit_base`
+- dlrm and deepfm models in `torchrec`
+- `diffusers.VQModel`
+- `transformers.AlbertModel`
+- `transformers.AlbertForPreTraining`
+- `transformers.BertModel`
+- `transformers.BertForPreTraining`
+- `transformers.GPT2DoubleHeadsModel`
+
+Compatibility problems will be fixed in the future.
+
+### Gemini Plugin
+
+This plugin implements Zero-3 with chunk-based and heterogeneous memory management. It can train large models without much loss in speed. It also does not support local gradient accumulation. More details can be found in [Gemini Doc](../features/zero_with_chunk.md).
+
+{{ autodoc:colossalai.booster.plugin.GeminiPlugin }}
+
+### Torch DDP Plugin
+
+More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
+
+{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
+
+### Torch FSDP Plugin
+
+> ⚠ This plugin is not available when torch version is lower than 1.12.0.
+
+More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/fsdp.html).
+
+{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}

From e23ece8dbbebb7e519029565ff7322e03e8f2fab Mon Sep 17 00:00:00 2001
From: ver217 <lhx0217@gmail.com>
Date: Wed, 17 May 2023 15:21:30 +0800
Subject: [PATCH 2/6] [doc] add booster plugins doc in sidebar

---
 docs/sidebars.json | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/sidebars.json b/docs/sidebars.json
index 44287c17eadf..dfeb389f5dd6 100644
--- a/docs/sidebars.json
+++ b/docs/sidebars.json
@@ -28,6 +28,7 @@
         "basics/command_line_tool",
         "basics/define_your_config",
         "basics/launch_colossalai",
+        "basics/booster_plugins",
         "basics/initialize_features",
         "basics/engine_trainer",
         "basics/configure_parallelization",

From 83a235565743e5fa5b4de8d6559b2bf9331366b4 Mon Sep 17 00:00:00 2001
From: ver217 <lhx0217@gmail.com>
Date: Wed, 17 May 2023 15:36:20 +0800
Subject: [PATCH 3/6] [doc] add zh booster plugins doc

---
 docs/source/zh-Hans/basics/booster_plugins.md | 64 +++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 docs/source/zh-Hans/basics/booster_plugins.md

diff --git a/docs/source/zh-Hans/basics/booster_plugins.md b/docs/source/zh-Hans/basics/booster_plugins.md
new file mode 100644
index 000000000000..0149c58b8fb7
--- /dev/null
+++ b/docs/source/zh-Hans/basics/booster_plugins.md
@@ -0,0 +1,64 @@
+# Booster 插件
+
+作者: [Hongxin Liu](https://github.com/ver217)
+
+**前置教程:**
+- [Booster API](./booster_api.md)
+
+## 引言
+
+正如 [Booster API](./booster_api.md) 中提到的，我们可以使用 booster 插件来自定义并行训练。在本教程中，我们将介绍如何使用 booster 插件。
+
+我们现在提供以下插件:
+
+- [Low Level Zero 插件](#low-level-zero-plugin): 它包装了 `colossalai.zero.low_level.LowLevelZeroOptimizer`，可用于使用 Zero-dp 训练模型。它仅支持 Zero 阶段1和阶段2。
+- [Gemini 插件](#gemini-plugin): 它包装了 [Gemini](../features/zero_with_chunk.md)，Gemini 实现了基于Chunk内存管理和异构内存管理的 Zero-3。
+- [Torch DDP 插件](#torch-ddp-plugin): 它包装了 `torch.nn.parallel.DistributedDataParallel` 并且可用于使用数据并行训练模型。
+- [Torch FSDP 插件](#torch-fsdp-plugin): 它包装了 `torch.distributed.fsdp.FullyShardedDataParallel` 并且可用于使用 Zero-dp 训练模型。
+
+更多插件即将推出。
+
+## 插件
+
+### Low Level Zero 插件
+
+该插件实现了 Zero-1 和 Zero-2（使用/不使用 CPU 卸载），使用`reduce`和`gather`来同步梯度和权重。
+
+Zero-1 可以看作是 Torch DDP 更好的替代品，内存效率更高，速度更快。它可以很容易地用于混合并行。
+
+Zero-2 不支持局部梯度累积。如果您坚持使用，虽然可以积累梯度，但不能降低通信成本。也就是说，同时使用流水线并行和 Zero-2 并不是一个好主意。
+
+{{ autodoc:colossalai.booster.plugin.LowLevelZeroPlugin }}
+
+我们已经测试了一些主流模型的兼容性，可能不支持以下模型：
+
+- `timm.models.convit_base`
+- dlrm and deepfm models in `torchrec`
+- `diffusers.VQModel`
+- `transformers.AlbertModel`
+- `transformers.AlbertForPreTraining`
+- `transformers.BertModel`
+- `transformers.BertForPreTraining`
+- `transformers.GPT2DoubleHeadsModel`
+
+兼容性问题将在未来修复。
+
+### Gemini Plugin
+
+这个插件实现了基于Chunk内存管理和异构内存管理的 Zero-3。它可以训练大型模型而不会损失太多速度。它也不支持局部梯度累积。更多详细信息，请参阅 [Gemini 文档](../features/zero_with_chunk.md).
+
+{{ autodoc:colossalai.booster.plugin.GeminiPlugin }}
+
+### Torch DDP Plugin
+
+更多详细信息，请参阅 [Pytorch 文档](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
+
+{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
+
+### Torch FSDP Plugin
+
+> ⚠ 如果 torch 版本低于 1.12.0，此插件将不可用。
+
+更多详细信息，请参阅 [Pytorch 文档](https://pytorch.org/docs/main/fsdp.html).
+
+{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}

From 6e6331676b81d78c7dbf491c90b19ec3faff6028 Mon Sep 17 00:00:00 2001
From: ver217 <lhx0217@gmail.com>
Date: Thu, 18 May 2023 11:44:24 +0800
Subject: [PATCH 4/6] [doc] fix zh booster plugin translation

---
 docs/source/zh-Hans/basics/booster_plugins.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/zh-Hans/basics/booster_plugins.md b/docs/source/zh-Hans/basics/booster_plugins.md
index 0149c58b8fb7..e0258eb37932 100644
--- a/docs/source/zh-Hans/basics/booster_plugins.md
+++ b/docs/source/zh-Hans/basics/booster_plugins.md
@@ -43,19 +43,19 @@ Zero-2 不支持局部梯度累积。如果您坚持使用，虽然可以积累
 
 兼容性问题将在未来修复。
 
-### Gemini Plugin
+### Gemini 插件
 
 这个插件实现了基于Chunk内存管理和异构内存管理的 Zero-3。它可以训练大型模型而不会损失太多速度。它也不支持局部梯度累积。更多详细信息，请参阅 [Gemini 文档](../features/zero_with_chunk.md).
 
 {{ autodoc:colossalai.booster.plugin.GeminiPlugin }}
 
-### Torch DDP Plugin
+### Torch DDP 插件
 
 更多详细信息，请参阅 [Pytorch 文档](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
 
 {{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
 
-### Torch FSDP Plugin
+### Torch FSDP 插件
 
 > ⚠ 如果 torch 版本低于 1.12.0，此插件将不可用。
 

From 0888424bb8e5d60d770933db9175eef37b15b95a Mon Sep 17 00:00:00 2001
From: ver217 <lhx0217@gmail.com>
Date: Thu, 18 May 2023 11:46:03 +0800
Subject: [PATCH 5/6] [doc] reoganize tutorials order of basic section

---
 docs/sidebars.json | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/sidebars.json b/docs/sidebars.json
index f3faa4272578..be783bb6e247 100644
--- a/docs/sidebars.json
+++ b/docs/sidebars.json
@@ -26,15 +26,15 @@
       "collapsed": true,
       "items": [
         "basics/command_line_tool",
-        "basics/define_your_config",
         "basics/launch_colossalai",
+        "basics/booster_api",
         "basics/booster_plugins",
+        "basics/define_your_config",
         "basics/initialize_features",
         "basics/engine_trainer",
         "basics/configure_parallelization",
         "basics/model_checkpoint",
-        "basics/colotensor_concept",
-        "basics/booster_api"
+        "basics/colotensor_concept"
       ]
     },
     {

From 5666fa144c393406f3501ff4e557487fcf81bfe5 Mon Sep 17 00:00:00 2001
From: ver217 <lhx0217@gmail.com>
Date: Thu, 18 May 2023 11:50:08 +0800
Subject: [PATCH 6/6] [devops] force sync to test ci