From ed5d525d812e4cbf2f811c18b7d0cb0765d88921 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Tue, 9 May 2023 17:09:01 +0800
Subject: [PATCH 01/30] [booster] update booster tutorials#3717

---
 docs/source/en/basics/colossalai_booster.md   | 124 +++++++++++++++++
 .../zh-Hans/basics/colossalai_booster.md      | 125 ++++++++++++++++++
 2 files changed, 249 insertions(+)
 create mode 100644 docs/source/en/basics/colossalai_booster.md
 create mode 100644 docs/source/zh-Hans/basics/colossalai_booster.md
diff --git a/docs/source/en/basics/colossalai_booster.md b/docs/source/en/basics/colossalai_booster.md
new file mode 100644
index 000000000000..fc33e8cbe039
--- /dev/null
+++ b/docs/source/en/basics/colossalai_booster.md
@@ -0,0 +1,124 @@
+# colossal-ai booster
+
+**Prerequisite:**
+- [Distributed Training](../concepts/distributed_training.md)
+- [Colossal-AI Overview](../concepts/colossalai_overview.md)
+
+## Introduction
+In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, user can integrate their model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
+
+### Plugin
+<p>Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:</p>
+
+***GeminiPlugin:*** <p> This plugin wrapps the Gemini acceleration solution, that ZeRO with chunk-based memory management. </p>
+
+***TorchDDPPlugin:*** <p>This plugin wrapps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines. </p>
+
+***LowLevelZeroPlugin:*** <p>This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.</p>
+
+### API of booster
+Booster.__init__(...):
+* Args:
+    * device (str or torch.device): The device to run the training. Default: 'cuda'.
+    * mixed_precision (str or MixedPrecision): The mixed precision to run the training. Default: None.If the argument is a string, it can be 'fp16', 'fp16_apex', 'bf16', or 'fp8'.'fp16' would use PyTorch AMP while 'fp16_apex' would use Nvidia Apex.
+    * plugin (Plugin): The plugin to run the training. Default: None.
+* Return:
+    * booster (Booster)
+
+
+booster.boost(...): This function is called to boost objects. (e.g. model, optimizer, criterion).
+* Args:
+    * model (nn.Module): The model to be boosted.
+    * optimizer (Optimizer): The optimizer to be boosted.
+    * criterion (Callable): The criterion to be boosted.
+    * dataloader (DataLoader): The dataloader to be boosted.
+    * lr_scheduler (LRScheduler): The lr_scheduler to be boosted.
+* Return:
+    * model, optimizer, criterion, dataloader, lr_scheduler
+
+booster.backward(loss, optimizer): This function run the backward operation
+* Args:
+    * loss (torch.Tensor)
+    * optimizer (Optimizer)
+
+booster.no_sync(model) :A context manager to disable gradient synchronizations across processes.
+
+booster.save_model(...): This function is called to save model checkpoints
+* Args:
+    * model: nn.Module,
+    * checkpoint: str,
+    * prefix: str = None,
+    * shard: bool = False, # if saved as shards
+    * size_per_shard: int = 1024  # the max length of shard
+
+booster.load_model(...):
+* Args:
+    * model: nn.Module,
+    * checkpoint: str,
+    * strict: bool = True
+
+booster.save_optimizer(...): This function is called to save optimizer checkpoints
+* Args:
+    * optimizer: Optimizer,
+    * checkpoint: str,
+    * shard: bool = False, # if saved as shards
+    * size_per_shard: int = 1024  # the max length of shard
+
+booster.load_optimizer(...):
+* Args:
+    * optimizer: Optimizer,
+    * checkpoint: str,
+
+booster.save_lr_scheduler(...): This function is called to save lr scheduler checkpoints
+* Args:
+    * lr_scheduler: LRScheduler,
+    * checkpoint: str,
+
+booster.load_lr_scheduler(...):
+* Args:
+    * lr_scheduler: LRScheduler,
+    * checkpoint: str,
+
+## usage
+In a typical workflow, you need to launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster API and these returned objects to continue the rest of your training processes.
+
+<P> A pseudo-code example is like below: </p>
+
+```python
+import torch
+from torch.optim import SGD
+from torchvision.models import resnet18
+
+import colossalai
+from colossalai.booster import Booster
+from colossalai.booster.plugin import TorchDDPPlugin
+
+def train():
+    colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost')
+    plugin = TorchDDPPlugin()
+    booster = Booster(plugin=plugin)
+    model = resnet18()
+    criterion = lambda x: x.mean()
+    optimizer = SGD((model.parameters()), lr=0.001)
+    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
+    model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler)
+
+    x = torch.randn(4, 3, 224, 224)
+    x = x.to('cuda')
+    output = model(x)
+    loss = criterion(output)
+    booster.backward(loss, optimizer)
+    optimizer.clip_grad_by_norm(1.0)
+    optimizer.step()
+    scheduler.step()
+
+    save_path = "./model"
+    booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors)
+
+    new_model = resnet18()
+    booster.load_model(new_model, save_path)
+```
+
+if you want to run a example, [click here](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+
+[more design detailers](https://github.com/hpcaitech/ColossalAI/discussions/3046)
diff --git a/docs/source/zh-Hans/basics/colossalai_booster.md b/docs/source/zh-Hans/basics/colossalai_booster.md
new file mode 100644
index 000000000000..703fb484e3be
--- /dev/null
+++ b/docs/source/zh-Hans/basics/colossalai_booster.md
@@ -0,0 +1,125 @@
+# booster 使用
+
+**预备知识:**
+- [分布式训练](../concepts/distributed_training.md)
+- [Colossal-AI 总览](../concepts/colossalai_overview.md)
+
+## 简介
+在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好的将我们的并行策略整合到模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
+在下面的章节中，我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。
+
+### Plugin
+<p>Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加速方案）。目前支持的插件如下：</p>
+
+***GeminiPlugin:*** <p> GeminiPlugin插件封装了 gemini 加速解决方案，即具有基于块的内存管理的 ZeRO优化方案。 </p>
+
+***TorchDDPPlugin:*** <p> TorchDDPPlugin插件封装了DDP加速方案，实现了模块级别的数据并行，可以跨多机运行。 </p>
+
+***LowLevelZeroPlugin:*** <p>LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：跨数据并行工作器/GPU 的分片优化器状态。阶段 2：分片优化器状态 + 跨数据并行工作者/GPU 的梯度。</p>
+
+### API of booster
+Booster.__init__(...):
+* 参数:
+    * device (str or torch.device): 行训练的设备。默认值：'cuda'。
+    * mixed_precision (str or MixedPrecision): 运行训练的混合精度。默认值：None。如果参数是字符串，则它可以是“fp16”、“fp16_apex”、“bf16”或“fp8”。“fp16”将使用 PyTorch AMP，而“fp16_apex”将使用 Nvidia Apex。
+    * plugin (Plugin): 运行训练的插件。默认值：None。
+    * booster (Booster)
+
+
+booster.boost(...): 调用此函数来注入特性到对象中。 （例如模型、优化器、标准）
+* 参数:
+    * model (nn.Module): 被注入的模型对象。
+    * optimizer (Optimizer): 被注入的优化器对象。
+    * criterion (Callable): 被注入的criterion对象。
+    * dataloader (DataLoader): 被注入的dataloader对象.
+    * lr_scheduler (LRScheduler): 被注入的lr_scheduler对象.
+* 返回值:
+    * model, optimizer, criterion, dataloader, lr_scheduler
+
+booster.backward(loss, optimizer): 调用该函数执行反向传播操作。
+* 参数:
+    * loss (torch.Tensor)
+    * optimizer (Optimizer)
+
+booster.no_sync(model) :返回一个上下文管理器，用于禁用跨进程的梯度同步。
+
+booster.save_model(...): 调用此函数以保存模型。
+* 参数:
+    * model: nn.Module,
+    * checkpoint: str,
+    * prefix: str = None,
+    * shard: bool = False, # if saved as shards
+    * size_per_shard: int = 1024  # the max length of shard
+
+booster.load_model(...): 调用该函数加载模型。
+* 参数:
+    * model: nn.Module,
+    * checkpoint: str,
+    * strict: bool = True
+
+booster.save_optimizer(...): 调用此函数以保存优化器。
+* 参数:
+    * optimizer: Optimizer,
+    * checkpoint: str,
+    * shard: bool = False, # if saved as shards
+    * size_per_shard: int = 1024  # the max length of shard
+
+booster.load_optimizer(...): 调用此函数以加载优化器。
+* 参数:
+    * optimizer: Optimizer,
+    * checkpoint: str,
+
+booster.save_lr_scheduler(...): 调用此函数以保存学习率更新器。
+* 参数:
+    * lr_scheduler: LRScheduler,
+    * checkpoint: str,
+
+booster.load_lr_scheduler(...): 调用此函数以加载学习率更新器。
+* 参数:
+    * lr_scheduler: LRScheduler,
+    * checkpoint: str,
+
+## usage
+
+在使用colossalai训练时，首先需要在训练脚本的开头启动分布式环境，并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后，调用`colossalai.booster` 将特征注入到这些对象中，您就可以使用我们的booster API去进行您接下来的训练流程。
+
+<P> 以下是一个伪代码示例，将展示如何使用我们的booster API进行模型训练: </p>
+
+```python
+import torch
+from torch.optim import SGD
+from torchvision.models import resnet18
+
+import colossalai
+from colossalai.booster import Booster
+from colossalai.booster.plugin import TorchDDPPlugin
+
+def train():
+    colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost')
+    plugin = TorchDDPPlugin()
+    booster = Booster(plugin=plugin)
+    model = resnet18()
+    criterion = lambda x: x.mean()
+    optimizer = SGD((model.parameters()), lr=0.001)
+    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
+    model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler)
+
+    x = torch.randn(4, 3, 224, 224)
+    x = x.to('cuda')
+    output = model(x)
+    loss = criterion(output)
+    booster.backward(loss, optimizer)
+    optimizer.clip_grad_by_norm(1.0)
+    optimizer.step()
+    scheduler.step()
+
+    save_path = "./model"
+    booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors)
+
+    new_model = resnet18()
+    booster.load_model(new_model, save_path)
+```
+
+如果您想运行一个可执行的例子, [请点击](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+
+[更多的设计细节请参考](https://github.com/hpcaitech/ColossalAI/discussions/3046)

From 2a2e889a6f169f5bc73e0c6bd4a34f8e7818186d Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Tue, 9 May 2023 18:05:42 +0800
Subject: [PATCH 02/30] [booster] update booster tutorials#3717, fix

---
 docs/source/zh-Hans/features/1D_tensor_parallel.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/zh-Hans/features/1D_tensor_parallel.md b/docs/source/zh-Hans/features/1D_tensor_parallel.md
index 2ddc27c7b50f..74954dac8f48 100644
--- a/docs/source/zh-Hans/features/1D_tensor_parallel.md
+++ b/docs/source/zh-Hans/features/1D_tensor_parallel.md
@@ -23,7 +23,7 @@
 ```math
 \left[\begin{matrix} B_1 \\ B_2 \end{matrix} \right]
 ```
-这就是所谓的行并行方式.<br>
+这就是所谓的行并行方式.
 
 为了计算
 ```math

From 9362e150e2f529497dfb4814c54cb33dcd7a33c8 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Thu, 11 May 2023 14:46:43 +0800
Subject: [PATCH 03/30] [booster] update booster tutorials#3717, update setup
 doc

---
 docs/source/en/get_started/installation.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/en/get_started/installation.md b/docs/source/en/get_started/installation.md
index 290879219074..93f9d074ead4 100644
--- a/docs/source/en/get_started/installation.md
+++ b/docs/source/en/get_started/installation.md
@@ -39,13 +39,13 @@ cd ColossalAI
 pip install -r requirements/requirements.txt
 
 # install colossalai
-pip install .
+CUDA_EXT=1 pip install .
 ```
 
-If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
+If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer), just don't specify the `CUDA_EXT`:
 
 ```shell
-CUDA_EXT=1 pip install .
+pip install .
 ```
 
 

From 52d7e930fed0223a0ca438ee5a5c7ccc63ff4d81 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Thu, 11 May 2023 14:47:15 +0800
Subject: [PATCH 04/30] [booster] update booster tutorials#3717, update setup
 doc

---
 docs/source/en/basics/launch_colossalai.md      | 14 +++++++++++---
 docs/source/zh-Hans/basics/launch_colossalai.md | 11 ++++++++++-
 docs/source/zh-Hans/get_started/installation.md |  6 +++---
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/docs/source/en/basics/launch_colossalai.md b/docs/source/en/basics/launch_colossalai.md
index be487f8539a5..334757ea75af 100644
--- a/docs/source/en/basics/launch_colossalai.md
+++ b/docs/source/en/basics/launch_colossalai.md
@@ -87,14 +87,13 @@ import colossalai
 args = colossalai.get_default_parser().parse_args()
 
 # launch distributed environment
-colossalai.launch(config=<CONFIG>,
+colossalai.launch(config=args.config,
                   rank=args.rank,
                   world_size=args.world_size,
                   host=args.host,
                   port=args.port,
                   backend=args.backend
 )
-
 ```
 
 
@@ -107,12 +106,21 @@ First, we need to set the launch method in our code. As this is a wrapper of the
 use `colossalai.launch_from_torch`. The arguments required for distributed environment such as rank, world size, host and port are all set by the PyTorch
 launcher and can be read from the environment variable directly.
 
+config.py
+```python
+BATCH_SIZE = 512
+LEARNING_RATE = 3e-3
+WEIGHT_DECAY = 0.3
+NUM_EPOCHS = 2
+```
+train.py
 ```python
 import colossalai
 
 colossalai.launch_from_torch(
-    config=<CONFIG>,
+    config="./config.py",
 )
+...
 ```
 
 Next, we can easily start multiple processes with `colossalai run` in your terminal. Below is an example to run the code
diff --git a/docs/source/zh-Hans/basics/launch_colossalai.md b/docs/source/zh-Hans/basics/launch_colossalai.md
index ca927de578d5..54fe7221dc7a 100644
--- a/docs/source/zh-Hans/basics/launch_colossalai.md
+++ b/docs/source/zh-Hans/basics/launch_colossalai.md
@@ -93,12 +93,21 @@ PyTorch自带的启动器需要在每个节点上都启动命令才能启动多
 首先，我们需要在代码里指定我们的启动方式。由于这个启动器是PyTorch启动器的封装，那么我们自然而然应该使用`colossalai.launch_from_torch`。
 分布式环境所需的参数，如 rank, world size, host 和 port 都是由 PyTorch 启动器设置的，可以直接从环境变量中读取。
 
+config.py
+```python
+BATCH_SIZE = 512
+LEARNING_RATE = 3e-3
+WEIGHT_DECAY = 0.3
+NUM_EPOCHS = 2
+```
+train.py
 ```python
 import colossalai
 
 colossalai.launch_from_torch(
-    config=<CONFIG>,
+    config="./config.py",
 )
+...
 ```
 
 接下来，我们可以轻松地在终端使用`colossalai run`来启动训练。下面的命令可以在当前机器上启动一个4卡的训练任务。
diff --git a/docs/source/zh-Hans/get_started/installation.md b/docs/source/zh-Hans/get_started/installation.md
index 72f85393814f..8858ae0fa262 100755
--- a/docs/source/zh-Hans/get_started/installation.md
+++ b/docs/source/zh-Hans/get_started/installation.md
@@ -38,13 +38,13 @@ cd ColossalAI
 pip install -r requirements/requirements.txt
 
 # install colossalai
-pip install .
+CUDA_EXT=1 pip install .
 ```
 
-如果您不想安装和启用 CUDA 内核融合（使用融合优化器时强制安装）：
+如果您不想安装和启用 CUDA 内核融合（使用融合优化器时强制安装）您可以不添加`CUDA_EXT=1`：
 
 ```shell
-NO_CUDA_EXT=1 pip install .
+pip install .
 ```
 
 <!-- doc-test-command: echo "installation.md does not need test" -->

From 111315dae633bab98c06eca26ecc9202472c7614 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 13:37:31 +0800
Subject: [PATCH 05/30] [booster] update booster tutorials#3717, update setup
 doc

---
 docs/source/en/basics/booster_api.md | 87 ++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100644 docs/source/en/basics/booster_api.md

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
new file mode 100644
index 000000000000..fa05eb44c812
--- /dev/null
+++ b/docs/source/en/basics/booster_api.md
@@ -0,0 +1,87 @@
+# Booster API
+
+**Prerequisite:**
+- [Distributed Training](../concepts/distributed_training.md)
+- [Colossal-AI Overview](../concepts/colossalai_overview.md)
+
+## Introduction
+In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, user can integrate their model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
+
+### Plugin
+Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:
+
+***GeminiPlugin:*** This plugin wrapps the Gemini acceleration solution, that ZeRO with chunk-based memory management.
+
+***TorchDDPPlugin:*** This plugin wrapps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines.
+
+***LowLevelZeroPlugin:*** This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.
+
+### API of booster
+
+
+{{ autodoc:colossalai.booster.Booster.__init__ }}
+
+{{ autodoc:colossalai.booster.Booster.boost }}
+
+{{ autodoc:colossalai.booster.Booster.backward }}
+
+{{ autodoc:colossalai.booster.Booster.no_sync }}
+
+{{ autodoc:colossalai.booster.Booster.save_model }}
+
+{{ autodoc:colossalai.booster.Booster.load_model }}
+
+{{ autodoc:colossalai.booster.Booster.save_optimizer }}
+
+{{ autodoc:colossalai.booster.Booster.load_optimizer }}
+
+{{ autodoc:colossalai.booster.Booster.save_lr_scheduler }}
+
+{{ autodoc:colossalai.booster.Booster.load_lr_scheduler }}
+
+## usage
+In a typical workflow, you need to launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster API and these returned objects to continue the rest of your training processes.
+
+A pseudo-code example is like below:
+
+```python
+import torch
+from torch.optim import SGD
+from torchvision.models import resnet18
+
+import colossalai
+from colossalai.booster import Booster
+from colossalai.booster.plugin import TorchDDPPlugin
+
+def train():
+    colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost')
+    plugin = TorchDDPPlugin()
+    booster = Booster(plugin=plugin)
+    model = resnet18()
+    criterion = lambda x: x.mean()
+    optimizer = SGD((model.parameters()), lr=0.001)
+    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
+    model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler)
+
+    x = torch.randn(4, 3, 224, 224)
+    x = x.to('cuda')
+    output = model(x)
+    loss = criterion(output)
+    booster.backward(loss, optimizer)
+    optimizer.clip_grad_by_norm(1.0)
+    optimizer.step()
+    scheduler.step()
+
+    save_path = "./model"
+    booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors)
+
+    new_model = resnet18()
+    booster.load_model(new_model, save_path)
+```
+
+if you want to run a example, [click here](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+
+[more design detailers](https://github.com/hpcaitech/ColossalAI/discussions/3046)
+
+
+<!-- doc-test-command: torchrun --standalone --nproc_per_node=1 booster_api.py  -->

From 24987bd1cbfe2c2806c8d4ecd024225e6c1b9375 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 13:37:51 +0800
Subject: [PATCH 06/30] [booster] update booster tutorials#3717, update setup
 doc

---
 docs/source/zh-Hans/basics/booster_api.md | 125 ++++++++++++++++++++++
 1 file changed, 125 insertions(+)
 create mode 100644 docs/source/zh-Hans/basics/booster_api.md

diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
new file mode 100644
index 000000000000..703fb484e3be
--- /dev/null
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -0,0 +1,125 @@
+# booster 使用
+
+**预备知识:**
+- [分布式训练](../concepts/distributed_training.md)
+- [Colossal-AI 总览](../concepts/colossalai_overview.md)
+
+## 简介
+在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好的将我们的并行策略整合到模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
+在下面的章节中，我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。
+
+### Plugin
+<p>Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加速方案）。目前支持的插件如下：</p>
+
+***GeminiPlugin:*** <p> GeminiPlugin插件封装了 gemini 加速解决方案，即具有基于块的内存管理的 ZeRO优化方案。 </p>
+
+***TorchDDPPlugin:*** <p> TorchDDPPlugin插件封装了DDP加速方案，实现了模块级别的数据并行，可以跨多机运行。 </p>
+
+***LowLevelZeroPlugin:*** <p>LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：跨数据并行工作器/GPU 的分片优化器状态。阶段 2：分片优化器状态 + 跨数据并行工作者/GPU 的梯度。</p>
+
+### API of booster
+Booster.__init__(...):
+* 参数:
+    * device (str or torch.device): 行训练的设备。默认值：'cuda'。
+    * mixed_precision (str or MixedPrecision): 运行训练的混合精度。默认值：None。如果参数是字符串，则它可以是“fp16”、“fp16_apex”、“bf16”或“fp8”。“fp16”将使用 PyTorch AMP，而“fp16_apex”将使用 Nvidia Apex。
+    * plugin (Plugin): 运行训练的插件。默认值：None。
+    * booster (Booster)
+
+
+booster.boost(...): 调用此函数来注入特性到对象中。 （例如模型、优化器、标准）
+* 参数:
+    * model (nn.Module): 被注入的模型对象。
+    * optimizer (Optimizer): 被注入的优化器对象。
+    * criterion (Callable): 被注入的criterion对象。
+    * dataloader (DataLoader): 被注入的dataloader对象.
+    * lr_scheduler (LRScheduler): 被注入的lr_scheduler对象.
+* 返回值:
+    * model, optimizer, criterion, dataloader, lr_scheduler
+
+booster.backward(loss, optimizer): 调用该函数执行反向传播操作。
+* 参数:
+    * loss (torch.Tensor)
+    * optimizer (Optimizer)
+
+booster.no_sync(model) :返回一个上下文管理器，用于禁用跨进程的梯度同步。
+
+booster.save_model(...): 调用此函数以保存模型。
+* 参数:
+    * model: nn.Module,
+    * checkpoint: str,
+    * prefix: str = None,
+    * shard: bool = False, # if saved as shards
+    * size_per_shard: int = 1024  # the max length of shard
+
+booster.load_model(...): 调用该函数加载模型。
+* 参数:
+    * model: nn.Module,
+    * checkpoint: str,
+    * strict: bool = True
+
+booster.save_optimizer(...): 调用此函数以保存优化器。
+* 参数:
+    * optimizer: Optimizer,
+    * checkpoint: str,
+    * shard: bool = False, # if saved as shards
+    * size_per_shard: int = 1024  # the max length of shard
+
+booster.load_optimizer(...): 调用此函数以加载优化器。
+* 参数:
+    * optimizer: Optimizer,
+    * checkpoint: str,
+
+booster.save_lr_scheduler(...): 调用此函数以保存学习率更新器。
+* 参数:
+    * lr_scheduler: LRScheduler,
+    * checkpoint: str,
+
+booster.load_lr_scheduler(...): 调用此函数以加载学习率更新器。
+* 参数:
+    * lr_scheduler: LRScheduler,
+    * checkpoint: str,
+
+## usage
+
+在使用colossalai训练时，首先需要在训练脚本的开头启动分布式环境，并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后，调用`colossalai.booster` 将特征注入到这些对象中，您就可以使用我们的booster API去进行您接下来的训练流程。
+
+<P> 以下是一个伪代码示例，将展示如何使用我们的booster API进行模型训练: </p>
+
+```python
+import torch
+from torch.optim import SGD
+from torchvision.models import resnet18
+
+import colossalai
+from colossalai.booster import Booster
+from colossalai.booster.plugin import TorchDDPPlugin
+
+def train():
+    colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost')
+    plugin = TorchDDPPlugin()
+    booster = Booster(plugin=plugin)
+    model = resnet18()
+    criterion = lambda x: x.mean()
+    optimizer = SGD((model.parameters()), lr=0.001)
+    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
+    model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler)
+
+    x = torch.randn(4, 3, 224, 224)
+    x = x.to('cuda')
+    output = model(x)
+    loss = criterion(output)
+    booster.backward(loss, optimizer)
+    optimizer.clip_grad_by_norm(1.0)
+    optimizer.step()
+    scheduler.step()
+
+    save_path = "./model"
+    booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors)
+
+    new_model = resnet18()
+    booster.load_model(new_model, save_path)
+```
+
+如果您想运行一个可执行的例子, [请点击](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+
+[更多的设计细节请参考](https://github.com/hpcaitech/ColossalAI/discussions/3046)

From c3d44adfdf84936ca6f4b82fc0846891eba222d5 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 13:38:58 +0800
Subject: [PATCH 07/30] [booster] update booster tutorials#3717, update setup
 doc

---
 docs/source/zh-Hans/basics/booster_api.md | 92 +++++++----------------
 1 file changed, 27 insertions(+), 65 deletions(-)

diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 703fb484e3be..47903426f679 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -9,81 +9,41 @@
 在下面的章节中，我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。
 
 ### Plugin
-<p>Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加速方案）。目前支持的插件如下：</p>
+Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加速方案）。目前支持的插件如下：
 
-***GeminiPlugin:*** <p> GeminiPlugin插件封装了 gemini 加速解决方案，即具有基于块的内存管理的 ZeRO优化方案。 </p>
+***GeminiPlugin:*** GeminiPlugin插件封装了 gemini 加速解决方案，即具有基于块的内存管理的 ZeRO优化方案。
 
-***TorchDDPPlugin:*** <p> TorchDDPPlugin插件封装了DDP加速方案，实现了模块级别的数据并行，可以跨多机运行。 </p>
+***TorchDDPPlugin:*** TorchDDPPlugin插件封装了DDP加速方案，实现了模块级别的数据并行，可以跨多机运行。
 
-***LowLevelZeroPlugin:*** <p>LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：跨数据并行工作器/GPU 的分片优化器状态。阶段 2：分片优化器状态 + 跨数据并行工作者/GPU 的梯度。</p>
+***LowLevelZeroPlugin:*** LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：跨数据并行工作器/GPU 的分片优化器状态。阶段 2：分片优化器状态 + 跨数据并行工作者/GPU 的梯度。
 
 ### API of booster
-Booster.__init__(...):
-* 参数:
-    * device (str or torch.device): 行训练的设备。默认值：'cuda'。
-    * mixed_precision (str or MixedPrecision): 运行训练的混合精度。默认值：None。如果参数是字符串，则它可以是“fp16”、“fp16_apex”、“bf16”或“fp8”。“fp16”将使用 PyTorch AMP，而“fp16_apex”将使用 Nvidia Apex。
-    * plugin (Plugin): 运行训练的插件。默认值：None。
-    * booster (Booster)
-
-
-booster.boost(...): 调用此函数来注入特性到对象中。 （例如模型、优化器、标准）
-* 参数:
-    * model (nn.Module): 被注入的模型对象。
-    * optimizer (Optimizer): 被注入的优化器对象。
-    * criterion (Callable): 被注入的criterion对象。
-    * dataloader (DataLoader): 被注入的dataloader对象.
-    * lr_scheduler (LRScheduler): 被注入的lr_scheduler对象.
-* 返回值:
-    * model, optimizer, criterion, dataloader, lr_scheduler
-
-booster.backward(loss, optimizer): 调用该函数执行反向传播操作。
-* 参数:
-    * loss (torch.Tensor)
-    * optimizer (Optimizer)
-
-booster.no_sync(model) :返回一个上下文管理器，用于禁用跨进程的梯度同步。
-
-booster.save_model(...): 调用此函数以保存模型。
-* 参数:
-    * model: nn.Module,
-    * checkpoint: str,
-    * prefix: str = None,
-    * shard: bool = False, # if saved as shards
-    * size_per_shard: int = 1024  # the max length of shard
-
-booster.load_model(...): 调用该函数加载模型。
-* 参数:
-    * model: nn.Module,
-    * checkpoint: str,
-    * strict: bool = True
-
-booster.save_optimizer(...): 调用此函数以保存优化器。
-* 参数:
-    * optimizer: Optimizer,
-    * checkpoint: str,
-    * shard: bool = False, # if saved as shards
-    * size_per_shard: int = 1024  # the max length of shard
-
-booster.load_optimizer(...): 调用此函数以加载优化器。
-* 参数:
-    * optimizer: Optimizer,
-    * checkpoint: str,
-
-booster.save_lr_scheduler(...): 调用此函数以保存学习率更新器。
-* 参数:
-    * lr_scheduler: LRScheduler,
-    * checkpoint: str,
-
-booster.load_lr_scheduler(...): 调用此函数以加载学习率更新器。
-* 参数:
-    * lr_scheduler: LRScheduler,
-    * checkpoint: str,
+
+{{ autodoc:colossalai.booster.Booster.__init__ }}
+
+{{ autodoc:colossalai.booster.Booster.boost }}
+
+{{ autodoc:colossalai.booster.Booster.backward }}
+
+{{ autodoc:colossalai.booster.Booster.no_sync }}
+
+{{ autodoc:colossalai.booster.Booster.save_model }}
+
+{{ autodoc:colossalai.booster.Booster.load_model }}
+
+{{ autodoc:colossalai.booster.Booster.save_optimizer }}
+
+{{ autodoc:colossalai.booster.Booster.load_optimizer }}
+
+{{ autodoc:colossalai.booster.Booster.save_lr_scheduler }}
+
+{{ autodoc:colossalai.booster.Booster.load_lr_scheduler }}
 
 ## usage
 
 在使用colossalai训练时，首先需要在训练脚本的开头启动分布式环境，并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后，调用`colossalai.booster` 将特征注入到这些对象中，您就可以使用我们的booster API去进行您接下来的训练流程。
 
-<P> 以下是一个伪代码示例，将展示如何使用我们的booster API进行模型训练: </p>
+以下是一个伪代码示例，将展示如何使用我们的booster API进行模型训练:
 
 ```python
 import torch
@@ -123,3 +83,5 @@ def train():
 如果您想运行一个可执行的例子, [请点击](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
 
 [更多的设计细节请参考](https://github.com/hpcaitech/ColossalAI/discussions/3046)
+
+<!-- doc-test-command: torchrun --standalone --nproc_per_node=1 booster_api.py  -->

From e8d7b9468006b1ebacd1593c64fcf47351587f2f Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 13:40:12 +0800
Subject: [PATCH 08/30] [booster] update booster tutorials#3717, update setup
 doc

---
 docs/sidebars.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/sidebars.json b/docs/sidebars.json
index 44287c17eadf..2732704a5cab 100644
--- a/docs/sidebars.json
+++ b/docs/sidebars.json
@@ -32,7 +32,8 @@
         "basics/engine_trainer",
         "basics/configure_parallelization",
         "basics/model_checkpoint",
-        "basics/colotensor_concept"
+        "basics/colotensor_concept",
+        "basics/booster_api"
       ]
     },
     {

From 68e84be98342ec03c9c5d74318b512604e805489 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 13:43:53 +0800
Subject: [PATCH 09/30] [booster] update booster tutorials#3717, rename
 colossalai booster.md

---
 docs/source/en/basics/colossalai_booster.md   | 124 -----------------
 .../zh-Hans/basics/colossalai_booster.md      | 125 ------------------
 2 files changed, 249 deletions(-)
 delete mode 100644 docs/source/en/basics/colossalai_booster.md
 delete mode 100644 docs/source/zh-Hans/basics/colossalai_booster.md

diff --git a/docs/source/en/basics/colossalai_booster.md b/docs/source/en/basics/colossalai_booster.md
deleted file mode 100644
index fc33e8cbe039..000000000000
--- a/docs/source/en/basics/colossalai_booster.md
+++ /dev/null
@@ -1,124 +0,0 @@
-# colossal-ai booster
-
-**Prerequisite:**
-- [Distributed Training](../concepts/distributed_training.md)
-- [Colossal-AI Overview](../concepts/colossalai_overview.md)
-
-## Introduction
-In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, user can integrate their model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
-
-### Plugin
-<p>Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:</p>
-
-***GeminiPlugin:*** <p> This plugin wrapps the Gemini acceleration solution, that ZeRO with chunk-based memory management. </p>
-
-***TorchDDPPlugin:*** <p>This plugin wrapps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines. </p>
-
-***LowLevelZeroPlugin:*** <p>This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.</p>
-
-### API of booster
-Booster.__init__(...):
-* Args:
-    * device (str or torch.device): The device to run the training. Default: 'cuda'.
-    * mixed_precision (str or MixedPrecision): The mixed precision to run the training. Default: None.If the argument is a string, it can be 'fp16', 'fp16_apex', 'bf16', or 'fp8'.'fp16' would use PyTorch AMP while 'fp16_apex' would use Nvidia Apex.
-    * plugin (Plugin): The plugin to run the training. Default: None.
-* Return:
-    * booster (Booster)
-
-
-booster.boost(...): This function is called to boost objects. (e.g. model, optimizer, criterion).
-* Args:
-    * model (nn.Module): The model to be boosted.
-    * optimizer (Optimizer): The optimizer to be boosted.
-    * criterion (Callable): The criterion to be boosted.
-    * dataloader (DataLoader): The dataloader to be boosted.
-    * lr_scheduler (LRScheduler): The lr_scheduler to be boosted.
-* Return:
-    * model, optimizer, criterion, dataloader, lr_scheduler
-
-booster.backward(loss, optimizer): This function run the backward operation
-* Args:
-    * loss (torch.Tensor)
-    * optimizer (Optimizer)
-
-booster.no_sync(model) :A context manager to disable gradient synchronizations across processes.
-
-booster.save_model(...): This function is called to save model checkpoints
-* Args:
-    * model: nn.Module,
-    * checkpoint: str,
-    * prefix: str = None,
-    * shard: bool = False, # if saved as shards
-    * size_per_shard: int = 1024  # the max length of shard
-
-booster.load_model(...):
-* Args:
-    * model: nn.Module,
-    * checkpoint: str,
-    * strict: bool = True
-
-booster.save_optimizer(...): This function is called to save optimizer checkpoints
-* Args:
-    * optimizer: Optimizer,
-    * checkpoint: str,
-    * shard: bool = False, # if saved as shards
-    * size_per_shard: int = 1024  # the max length of shard
-
-booster.load_optimizer(...):
-* Args:
-    * optimizer: Optimizer,
-    * checkpoint: str,
-
-booster.save_lr_scheduler(...): This function is called to save lr scheduler checkpoints
-* Args:
-    * lr_scheduler: LRScheduler,
-    * checkpoint: str,
-
-booster.load_lr_scheduler(...):
-* Args:
-    * lr_scheduler: LRScheduler,
-    * checkpoint: str,
-
-## usage
-In a typical workflow, you need to launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster API and these returned objects to continue the rest of your training processes.
-
-<P> A pseudo-code example is like below: </p>
-
-```python
-import torch
-from torch.optim import SGD
-from torchvision.models import resnet18
-
-import colossalai
-from colossalai.booster import Booster
-from colossalai.booster.plugin import TorchDDPPlugin
-
-def train():
-    colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost')
-    plugin = TorchDDPPlugin()
-    booster = Booster(plugin=plugin)
-    model = resnet18()
-    criterion = lambda x: x.mean()
-    optimizer = SGD((model.parameters()), lr=0.001)
-    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
-    model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler)
-
-    x = torch.randn(4, 3, 224, 224)
-    x = x.to('cuda')
-    output = model(x)
-    loss = criterion(output)
-    booster.backward(loss, optimizer)
-    optimizer.clip_grad_by_norm(1.0)
-    optimizer.step()
-    scheduler.step()
-
-    save_path = "./model"
-    booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors)
-
-    new_model = resnet18()
-    booster.load_model(new_model, save_path)
-```
-
-if you want to run a example, [click here](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
-
-[more design detailers](https://github.com/hpcaitech/ColossalAI/discussions/3046)
diff --git a/docs/source/zh-Hans/basics/colossalai_booster.md b/docs/source/zh-Hans/basics/colossalai_booster.md
deleted file mode 100644
index 703fb484e3be..000000000000
--- a/docs/source/zh-Hans/basics/colossalai_booster.md
+++ /dev/null
@@ -1,125 +0,0 @@
-# booster 使用
-
-**预备知识:**
-- [分布式训练](../concepts/distributed_training.md)
-- [Colossal-AI 总览](../concepts/colossalai_overview.md)
-
-## 简介
-在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好的将我们的并行策略整合到模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
-在下面的章节中，我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。
-
-### Plugin
-<p>Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加速方案）。目前支持的插件如下：</p>
-
-***GeminiPlugin:*** <p> GeminiPlugin插件封装了 gemini 加速解决方案，即具有基于块的内存管理的 ZeRO优化方案。 </p>
-
-***TorchDDPPlugin:*** <p> TorchDDPPlugin插件封装了DDP加速方案，实现了模块级别的数据并行，可以跨多机运行。 </p>
-
-***LowLevelZeroPlugin:*** <p>LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：跨数据并行工作器/GPU 的分片优化器状态。阶段 2：分片优化器状态 + 跨数据并行工作者/GPU 的梯度。</p>
-
-### API of booster
-Booster.__init__(...):
-* 参数:
-    * device (str or torch.device): 行训练的设备。默认值：'cuda'。
-    * mixed_precision (str or MixedPrecision): 运行训练的混合精度。默认值：None。如果参数是字符串，则它可以是“fp16”、“fp16_apex”、“bf16”或“fp8”。“fp16”将使用 PyTorch AMP，而“fp16_apex”将使用 Nvidia Apex。
-    * plugin (Plugin): 运行训练的插件。默认值：None。
-    * booster (Booster)
-
-
-booster.boost(...): 调用此函数来注入特性到对象中。 （例如模型、优化器、标准）
-* 参数:
-    * model (nn.Module): 被注入的模型对象。
-    * optimizer (Optimizer): 被注入的优化器对象。
-    * criterion (Callable): 被注入的criterion对象。
-    * dataloader (DataLoader): 被注入的dataloader对象.
-    * lr_scheduler (LRScheduler): 被注入的lr_scheduler对象.
-* 返回值:
-    * model, optimizer, criterion, dataloader, lr_scheduler
-
-booster.backward(loss, optimizer): 调用该函数执行反向传播操作。
-* 参数:
-    * loss (torch.Tensor)
-    * optimizer (Optimizer)
-
-booster.no_sync(model) :返回一个上下文管理器，用于禁用跨进程的梯度同步。
-
-booster.save_model(...): 调用此函数以保存模型。
-* 参数:
-    * model: nn.Module,
-    * checkpoint: str,
-    * prefix: str = None,
-    * shard: bool = False, # if saved as shards
-    * size_per_shard: int = 1024  # the max length of shard
-
-booster.load_model(...): 调用该函数加载模型。
-* 参数:
-    * model: nn.Module,
-    * checkpoint: str,
-    * strict: bool = True
-
-booster.save_optimizer(...): 调用此函数以保存优化器。
-* 参数:
-    * optimizer: Optimizer,
-    * checkpoint: str,
-    * shard: bool = False, # if saved as shards
-    * size_per_shard: int = 1024  # the max length of shard
-
-booster.load_optimizer(...): 调用此函数以加载优化器。
-* 参数:
-    * optimizer: Optimizer,
-    * checkpoint: str,
-
-booster.save_lr_scheduler(...): 调用此函数以保存学习率更新器。
-* 参数:
-    * lr_scheduler: LRScheduler,
-    * checkpoint: str,
-
-booster.load_lr_scheduler(...): 调用此函数以加载学习率更新器。
-* 参数:
-    * lr_scheduler: LRScheduler,
-    * checkpoint: str,
-
-## usage
-
-在使用colossalai训练时，首先需要在训练脚本的开头启动分布式环境，并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后，调用`colossalai.booster` 将特征注入到这些对象中，您就可以使用我们的booster API去进行您接下来的训练流程。
-
-<P> 以下是一个伪代码示例，将展示如何使用我们的booster API进行模型训练: </p>
-
-```python
-import torch
-from torch.optim import SGD
-from torchvision.models import resnet18
-
-import colossalai
-from colossalai.booster import Booster
-from colossalai.booster.plugin import TorchDDPPlugin
-
-def train():
-    colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost')
-    plugin = TorchDDPPlugin()
-    booster = Booster(plugin=plugin)
-    model = resnet18()
-    criterion = lambda x: x.mean()
-    optimizer = SGD((model.parameters()), lr=0.001)
-    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
-    model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler)
-
-    x = torch.randn(4, 3, 224, 224)
-    x = x.to('cuda')
-    output = model(x)
-    loss = criterion(output)
-    booster.backward(loss, optimizer)
-    optimizer.clip_grad_by_norm(1.0)
-    optimizer.step()
-    scheduler.step()
-
-    save_path = "./model"
-    booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors)
-
-    new_model = resnet18()
-    booster.load_model(new_model, save_path)
-```
-
-如果您想运行一个可执行的例子, [请点击](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
-
-[更多的设计细节请参考](https://github.com/hpcaitech/ColossalAI/discussions/3046)

From 6052a5d1cd28cf61541592383ef4e82cf5b739a2 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 13:45:37 +0800
Subject: [PATCH 10/30] [booster] update booster tutorials#3717, rename
 colossalai booster.md

---
 docs/source/zh-Hans/basics/launch_colossalai.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/zh-Hans/basics/launch_colossalai.md b/docs/source/zh-Hans/basics/launch_colossalai.md
index 54fe7221dc7a..39b09deae085 100644
--- a/docs/source/zh-Hans/basics/launch_colossalai.md
+++ b/docs/source/zh-Hans/basics/launch_colossalai.md
@@ -74,7 +74,7 @@ import colossalai
 args = colossalai.get_default_parser().parse_args()
 
 # launch distributed environment
-colossalai.launch(config=<CONFIG>,
+colossalai.launch(config=args.config,
                   rank=args.rank,
                   world_size=args.world_size,
                   host=args.host,

From 21d3af1cb7f9596698470d131ce8d38ae22c3d14 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 13:49:06 +0800
Subject: [PATCH 11/30] [booster] update booster tutorials#3717, rename
 colossalai booster.md

---
 docs/source/en/basics/booster_api.md      | 2 +-
 docs/source/zh-Hans/basics/booster_api.md | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index fa05eb44c812..ea57d197470b 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -39,7 +39,7 @@ Plugin is an important component that manages parallel configuration (eg: The ge
 
 {{ autodoc:colossalai.booster.Booster.load_lr_scheduler }}
 
-## usage
+## Usage
 In a typical workflow, you need to launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster API and these returned objects to continue the rest of your training processes.
 
 A pseudo-code example is like below:
diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 47903426f679..82c06155b68d 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -17,7 +17,7 @@ Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加
 
 ***LowLevelZeroPlugin:*** LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：跨数据并行工作器/GPU 的分片优化器状态。阶段 2：分片优化器状态 + 跨数据并行工作者/GPU 的梯度。
 
-### API of booster
+### Booster接口
 
 {{ autodoc:colossalai.booster.Booster.__init__ }}
 
@@ -39,7 +39,7 @@ Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加
 
 {{ autodoc:colossalai.booster.Booster.load_lr_scheduler }}
 
-## usage
+## 使用方法及示例
 
 在使用colossalai训练时，首先需要在训练脚本的开头启动分布式环境，并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后，调用`colossalai.booster` 将特征注入到这些对象中，您就可以使用我们的booster API去进行您接下来的训练流程。
 

From 98709913779dca54b5f10fca87302e1819780529 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 13:52:34 +0800
Subject: [PATCH 12/30] [booster] update booster tutorials#3717, fix

---
 docs/source/zh-Hans/get_started/installation.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/zh-Hans/get_started/installation.md b/docs/source/zh-Hans/get_started/installation.md
index 8858ae0fa262..bcf473c3c1bf 100755
--- a/docs/source/zh-Hans/get_started/installation.md
+++ b/docs/source/zh-Hans/get_started/installation.md
@@ -41,7 +41,7 @@ pip install -r requirements/requirements.txt
 CUDA_EXT=1 pip install .
 ```
 
-如果您不想安装和启用 CUDA 内核融合（使用融合优化器时强制安装）您可以不添加`CUDA_EXT=1`：
+如果您不想安装和启用 CUDA 内核融合（使用融合优化器时强制安装），您可以不添加`CUDA_EXT=1`：
 
 ```shell
 pip install .

From 6692c110c5d4ef7162142a45acb005c97bb5b771 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 13:55:06 +0800
Subject: [PATCH 13/30] [booster] update booster tutorials#3717, fix

---
 docs/source/en/get_started/installation.md      | 2 +-
 docs/source/zh-Hans/get_started/installation.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/en/get_started/installation.md b/docs/source/en/get_started/installation.md
index 93f9d074ead4..b626edb19e8e 100644
--- a/docs/source/en/get_started/installation.md
+++ b/docs/source/en/get_started/installation.md
@@ -29,7 +29,7 @@ CUDA_EXT=1 pip install colossalai
 
 ## Download From Source
 
-> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
+> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem.
 
 ```shell
 git clone https://github.com/hpcaitech/ColossalAI.git
diff --git a/docs/source/zh-Hans/get_started/installation.md b/docs/source/zh-Hans/get_started/installation.md
index bcf473c3c1bf..e0d726c74f64 100755
--- a/docs/source/zh-Hans/get_started/installation.md
+++ b/docs/source/zh-Hans/get_started/installation.md
@@ -28,7 +28,7 @@ CUDA_EXT=1 pip install colossalai
 
 ## 从源安装
 
-> 此文档将与版本库的主分支保持一致。如果您遇到任何问题，欢迎给我们提 issue :)
+> 此文档将与版本库的主分支保持一致。如果您遇到任何问题，欢迎给我们提 issue。
 
 ```shell
 git clone https://github.com/hpcaitech/ColossalAI.git

From 9cc14e30b4466e418e9e6450e9ec09910ecda020 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 14:21:20 +0800
Subject: [PATCH 14/30] [booster] update tutorials#3717, update booster api doc

---
 docs/source/en/basics/booster_api.md      | 6 ++++--
 docs/source/zh-Hans/basics/booster_api.md | 5 ++++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index ea57d197470b..4c1ee2bab058 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -1,9 +1,13 @@
 # Booster API
+author: Mingyan Jiang
 
 **Prerequisite:**
 - [Distributed Training](../concepts/distributed_training.md)
 - [Colossal-AI Overview](../concepts/colossalai_overview.md)
 
+**Example Code**
+- [Train with Booster](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+
 ## Introduction
 In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, user can integrate their model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
 
@@ -79,8 +83,6 @@ def train():
     booster.load_model(new_model, save_path)
 ```
 
-if you want to run a example, [click here](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
-
 [more design detailers](https://github.com/hpcaitech/ColossalAI/discussions/3046)
 
 
diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 82c06155b68d..53ea1db310e5 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -1,9 +1,12 @@
 # booster 使用
-
+作者: Mingyan Jiang
 **预备知识:**
 - [分布式训练](../concepts/distributed_training.md)
 - [Colossal-AI 总览](../concepts/colossalai_overview.md)
 
+**示例代码**
+- [使用booster训练](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+
 ## 简介
 在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好的将我们的并行策略整合到模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
 在下面的章节中，我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。

From 602c3aeb8616c43b4fae3af83fb9f75c8c883de5 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 14:31:17 +0800
Subject: [PATCH 15/30] [booster] update tutorials#3717, modify file

---
 docs/source/en/basics/booster_api.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index 4c1ee2bab058..6f08686756c2 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -9,7 +9,7 @@ author: Mingyan Jiang
 - [Train with Booster](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## Introduction
-In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, user can integrate their model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
+In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate their model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
 
 ### Plugin
 Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:

From f997d87792eea28926beea8002a4d769207ed3fc Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 14:32:03 +0800
Subject: [PATCH 16/30] [booster] update tutorials#3717, modify file

---
 docs/source/en/basics/booster_api.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index 6f08686756c2..85fbd041deca 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -9,7 +9,7 @@ author: Mingyan Jiang
 - [Train with Booster](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## Introduction
-In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate their model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
+In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate your model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
 
 ### Plugin
 Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:

From 607222438dff73795836cc9cfda7df43b3cbc493 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 14:33:38 +0800
Subject: [PATCH 17/30] [booster] update tutorials#3717, modify file

---
 docs/source/zh-Hans/basics/booster_api.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 53ea1db310e5..ea86168f4214 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -83,8 +83,6 @@ def train():
     booster.load_model(new_model, save_path)
 ```
 
-如果您想运行一个可执行的例子, [请点击](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
-
 [更多的设计细节请参考](https://github.com/hpcaitech/ColossalAI/discussions/3046)
 
 <!-- doc-test-command: torchrun --standalone --nproc_per_node=1 booster_api.py  -->

From 138d29242a78bf975e7053808783e15ce36f462b Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 14:38:55 +0800
Subject: [PATCH 18/30] [booster] update tutorials#3717, modify file

---
 docs/source/en/basics/booster_api.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index 85fbd041deca..3df8a6ad16d1 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -44,7 +44,7 @@ Plugin is an important component that manages parallel configuration (eg: The ge
 {{ autodoc:colossalai.booster.Booster.load_lr_scheduler }}
 
 ## Usage
-In a typical workflow, you need to launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster API and these returned objects to continue the rest of your training processes.
+In a typical workflow, you should launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster APIs and these returned objects to continue the rest of your training processes.
 
 A pseudo-code example is like below:
 

From 5a2ef21fc0713e678f79c8ae8945ca1e13aa2ed1 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 14:47:06 +0800
Subject: [PATCH 19/30] [booster] update tutorials#3717, modify file

---
 docs/source/zh-Hans/basics/booster_api.md | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index ea86168f4214..3e7f275188f7 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -8,17 +8,19 @@
 - [使用booster训练](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## 简介
-在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好的将我们的并行策略整合到模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
+在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
 在下面的章节中，我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。
 
 ### Plugin
 Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加速方案）。目前支持的插件如下：
 
-***GeminiPlugin:*** GeminiPlugin插件封装了 gemini 加速解决方案，即具有基于块的内存管理的 ZeRO优化方案。
+***GeminiPlugin:*** GeminiPlugin插件封装了 gemini 加速解决方案，即具有基于块内存管理的 ZeRO优化方案。
 
-***TorchDDPPlugin:*** TorchDDPPlugin插件封装了DDP加速方案，实现了模块级别的数据并行，可以跨多机运行。
+***TorchDDPPlugin:*** TorchDDPPlugin插件封装了DDP加速方案，实现了模型级别的数据并行，可以跨多机运行。
 
-***LowLevelZeroPlugin:*** LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：跨数据并行工作器/GPU 的分片优化器状态。阶段 2：分片优化器状态 + 跨数据并行工作者/GPU 的梯度。
+***LowLevelZeroPlugin:*** LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：切分优化器参数，分发到各并发进程或并发GPU上。阶段 2：切分优化器参数及梯度到各并发进程或并发GPU上。
+
+***LowLevelZeroPlugin:*** This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.
 
 ### Booster接口
 
@@ -44,7 +46,7 @@ Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加
 
 ## 使用方法及示例
 
-在使用colossalai训练时，首先需要在训练脚本的开头启动分布式环境，并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后，调用`colossalai.booster` 将特征注入到这些对象中，您就可以使用我们的booster API去进行您接下来的训练流程。
+在使用colossalai训练时，首先需要在训练脚本的开头启动分布式环境，并创建需要使用的模型、优化器、损失函数、数据加载器等对象。之后，调用`colossalai.booster` 将特征注入到这些对象中，您就可以使用我们的booster API去进行您接下来的训练流程。
 
 以下是一个伪代码示例，将展示如何使用我们的booster API进行模型训练:
 

From 9c20d0ac7936cff346da0e27f758691a4642e1c4 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 14:49:46 +0800
Subject: [PATCH 20/30] [booster] update tutorials#3717, modify file

---
 docs/source/zh-Hans/basics/booster_api.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 3e7f275188f7..1d741550356e 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -8,13 +8,13 @@
 - [使用booster训练](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## 简介
-在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
-在下面的章节中，我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。
+在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到待训练模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
+在下面的章节中，我们将介绍 `colossalai.booster` 是如何工作的以及使用时我们要注意的细节。
 
 ### Plugin
 Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加速方案）。目前支持的插件如下：
 
-***GeminiPlugin:*** GeminiPlugin插件封装了 gemini 加速解决方案，即具有基于块内存管理的 ZeRO优化方案。
+***GeminiPlugin:*** GeminiPlugin插件封装了 gemini 加速解决方案，即基于块内存管理的 ZeRO优化方案。
 
 ***TorchDDPPlugin:*** TorchDDPPlugin插件封装了DDP加速方案，实现了模型级别的数据并行，可以跨多机运行。
 

From 08101d05b1e7bf89eacad21fb94c804d3cbb36b2 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 14:50:25 +0800
Subject: [PATCH 21/30] [booster] update tutorials#3717, modify file

---
 docs/source/zh-Hans/basics/booster_api.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 1d741550356e..38517a676e9f 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -20,8 +20,6 @@ Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加
 
 ***LowLevelZeroPlugin:*** LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：切分优化器参数，分发到各并发进程或并发GPU上。阶段 2：切分优化器参数及梯度到各并发进程或并发GPU上。
 
-***LowLevelZeroPlugin:*** This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.
-
 ### Booster接口
 
 {{ autodoc:colossalai.booster.Booster.__init__ }}

From ba4d77a5c3fb138a179ce289573b56f5f09c16c2 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 15:02:25 +0800
Subject: [PATCH 22/30] [booster] update tutorials#3717, fix reference link

---
 docs/source/en/basics/booster_api.md      | 2 +-
 docs/source/zh-Hans/basics/booster_api.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index 3df8a6ad16d1..54df1215eff2 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -6,7 +6,7 @@ author: Mingyan Jiang
 - [Colossal-AI Overview](../concepts/colossalai_overview.md)
 
 **Example Code**
-- [Train with Booster](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+- [Train with Booster](ColossalAI/examples/tutorial/new_api/README.md)
 
 ## Introduction
 In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate your model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 38517a676e9f..366c34f85225 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -5,7 +5,7 @@
 - [Colossal-AI 总览](../concepts/colossalai_overview.md)
 
 **示例代码**
-- [使用booster训练](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+- [使用booster训练](ColossalAI/examples/tutorial/new_api/README.md)
 
 ## 简介
 在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到待训练模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。

From 8a4feb1c5e5479414090de3f73dcebbab77f1047 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 15:09:41 +0800
Subject: [PATCH 23/30] [booster] update tutorials#3717, fix reference link

---
 docs/source/en/basics/booster_api.md      | 2 +-
 docs/source/zh-Hans/basics/booster_api.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index 54df1215eff2..3df8a6ad16d1 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -6,7 +6,7 @@ author: Mingyan Jiang
 - [Colossal-AI Overview](../concepts/colossalai_overview.md)
 
 **Example Code**
-- [Train with Booster](ColossalAI/examples/tutorial/new_api/README.md)
+- [Train with Booster](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## Introduction
 In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate your model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 366c34f85225..38517a676e9f 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -5,7 +5,7 @@
 - [Colossal-AI 总览](../concepts/colossalai_overview.md)
 
 **示例代码**
-- [使用booster训练](ColossalAI/examples/tutorial/new_api/README.md)
+- [使用booster训练](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## 简介
 在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到待训练模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。

From e045350acf889f6c17a86af4a62a784eeff24eac Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 15:16:17 +0800
Subject: [PATCH 24/30] [booster] update tutorials#3717, fix reference link

---
 docs/source/en/basics/booster_api.md      | 2 +-
 docs/source/zh-Hans/basics/booster_api.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index 3df8a6ad16d1..a7a31446a348 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -6,7 +6,7 @@ author: Mingyan Jiang
 - [Colossal-AI Overview](../concepts/colossalai_overview.md)
 
 **Example Code**
-- [Train with Booster](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+- [Train with Booster](/examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## Introduction
 In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate your model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 38517a676e9f..ea8d677f9230 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -5,7 +5,7 @@
 - [Colossal-AI 总览](../concepts/colossalai_overview.md)
 
 **示例代码**
-- [使用booster训练](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+- [使用booster训练](/examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## 简介
 在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到待训练模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。

From f4a0bcf06c0fdf7ec71cce163082aa615ddcc080 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 15:25:10 +0800
Subject: [PATCH 25/30] [booster] update tutorials#3717, fix reference link

---
 docs/source/en/basics/booster_api.md      | 2 +-
 docs/source/zh-Hans/basics/booster_api.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index a7a31446a348..3df8a6ad16d1 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -6,7 +6,7 @@ author: Mingyan Jiang
 - [Colossal-AI Overview](../concepts/colossalai_overview.md)
 
 **Example Code**
-- [Train with Booster](/examples/tutorial/new_api/cifar_resnet/README.md)
+- [Train with Booster](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## Introduction
 In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate your model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index ea8d677f9230..38517a676e9f 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -5,7 +5,7 @@
 - [Colossal-AI 总览](../concepts/colossalai_overview.md)
 
 **示例代码**
-- [使用booster训练](/examples/tutorial/new_api/cifar_resnet/README.md)
+- [使用booster训练](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## 简介
 在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到待训练模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。

From e9cfb5cd77bfb46a2694eea8a0164441626cd53f Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 15:35:17 +0800
Subject: [PATCH 26/30] [booster] update tutorials#3717, fix reference link

---
 docs/source/en/basics/booster_api.md      | 4 ++--
 docs/source/zh-Hans/basics/booster_api.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index 3df8a6ad16d1..14dde65e43d6 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -6,7 +6,7 @@ author: Mingyan Jiang
 - [Colossal-AI Overview](../concepts/colossalai_overview.md)
 
 **Example Code**
-- [Train with Booster](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+- [Train with Booster](https://github.com/hpcaitech/ColossalAI/blob/main/examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## Introduction
 In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate your model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
@@ -83,7 +83,7 @@ def train():
     booster.load_model(new_model, save_path)
 ```
 
-[more design detailers](https://github.com/hpcaitech/ColossalAI/discussions/3046)
+[more design details](https://github.com/hpcaitech/ColossalAI/discussions/3046)
 
 
 <!-- doc-test-command: torchrun --standalone --nproc_per_node=1 booster_api.py  -->
diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 38517a676e9f..83d50d90fb17 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -5,7 +5,7 @@
 - [Colossal-AI 总览](../concepts/colossalai_overview.md)
 
 **示例代码**
-- [使用booster训练](../../../../examples/tutorial/new_api/cifar_resnet/README.md)
+- [使用booster训练](https://github.com/hpcaitech/ColossalAI/blob/main/examples/tutorial/new_api/cifar_resnet/README.md)
 
 ## 简介
 在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到待训练模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。

From bf51a6cad0eba4d534c2db0b12a24e8f54b96902 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 15:47:18 +0800
Subject: [PATCH 27/30] [booster] update tutorials#3717, fix reference link

---
 docs/source/en/basics/booster_api.md      | 2 +-
 docs/source/zh-Hans/basics/booster_api.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index 14dde65e43d6..872c5021317d 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -23,7 +23,7 @@ Plugin is an important component that manages parallel configuration (eg: The ge
 ### API of booster
 
 
-{{ autodoc:colossalai.booster.Booster.__init__ }}
+{{ autodoc:colossalai.booster.Booster }}
 
 {{ autodoc:colossalai.booster.Booster.boost }}
 
diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 83d50d90fb17..2f4bd07710a2 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -22,7 +22,7 @@ Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加
 
 ### Booster接口
 
-{{ autodoc:colossalai.booster.Booster.__init__ }}
+{{ autodoc:colossalai.booster.Booster }}
 
 {{ autodoc:colossalai.booster.Booster.boost }}
 

From 591fa122ee0d3cd9bbfba346c69ba4e1cee0dfd1 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 15:53:50 +0800
Subject: [PATCH 28/30] [booster] update tutorials#3717, fix reference link

---
 docs/source/en/basics/booster_api.md      | 2 +-
 docs/source/zh-Hans/basics/booster_api.md | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/source/en/basics/booster_api.md b/docs/source/en/basics/booster_api.md
index 872c5021317d..18dec4500f76 100644
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -1,5 +1,5 @@
 # Booster API
-author: Mingyan Jiang
+Author: [Mingyan Jiang](https://github.com/jiangmingyan)
 
 **Prerequisite:**
 - [Distributed Training](../concepts/distributed_training.md)
diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 2f4bd07710a2..5ed1b2c37f39 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -1,5 +1,6 @@
 # booster 使用
-作者: Mingyan Jiang
+作者: [Mingyan Jiang](https://github.com/jiangmingyan)
+
 **预备知识:**
 - [分布式训练](../concepts/distributed_training.md)
 - [Colossal-AI 总览](../concepts/colossalai_overview.md)

From 0f5703c2f1c84d32a07411e5c3275ad81b52a2a7 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 16:23:01 +0800
Subject: [PATCH 29/30] [booster] update tutorials#3713

---
 docs/source/zh-Hans/basics/booster_api.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 5ed1b2c37f39..5f6813d5c239 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -19,7 +19,7 @@ Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加
 
 ***TorchDDPPlugin:*** TorchDDPPlugin插件封装了DDP加速方案，实现了模型级别的数据并行，可以跨多机运行。
 
-***LowLevelZeroPlugin:*** LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：切分优化器参数，分发到各并发进程或并发GPU上。阶段 2：切分优化器参数及梯度到各并发进程或并发GPU上。
+***LowLevelZeroPlugin:*** LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1：切分优化器参数，分发到各并发进程或并发GPU上。阶段 2：切分优化器参数及梯度，分发到各并发进程或并发GPU上。
 
 ### Booster接口
 

From 274fc1a5be02c652705cb35f22f0ade11e5a1698 Mon Sep 17 00:00:00 2001
From: Mingyan Jiang <1829166702@qq.com>
Date: Wed, 17 May 2023 19:08:15 +0800
Subject: [PATCH 30/30] [booster] update tutorials#3713, modify file

---
 docs/source/zh-Hans/basics/booster_api.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md
index 5f6813d5c239..5410cc213fd2 100644
--- a/docs/source/zh-Hans/basics/booster_api.md
+++ b/docs/source/zh-Hans/basics/booster_api.md
@@ -12,8 +12,8 @@
 在我们的新设计中， `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如，模型、优化器、数据加载器）无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到待训练模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
 在下面的章节中，我们将介绍 `colossalai.booster` 是如何工作的以及使用时我们要注意的细节。
 
-### Plugin
-Plugin是管理并行配置的重要组件（eg：gemini插件封装了gemini加速方案）。目前支持的插件如下：
+### Booster插件
+Booster插件是管理并行配置的重要组件（eg：gemini插件封装了gemini加速方案）。目前支持的插件如下：
 
 ***GeminiPlugin:*** GeminiPlugin插件封装了 gemini 加速解决方案，即基于块内存管理的 ZeRO优化方案。