From ed5d525d812e4cbf2f811c18b7d0cb0765d88921 Mon Sep 17 00:00:00 2001 From: Mingyan Jiang <1829166702@qq.com> Date: Tue, 9 May 2023 17:09:01 +0800 Subject: [PATCH 01/30] [booster] update booster tutorials#3717 --- docs/source/en/basics/colossalai_booster.md | 124 +++++++++++++++++ .../zh-Hans/basics/colossalai_booster.md | 125 ++++++++++++++++++ 2 files changed, 249 insertions(+) create mode 100644 docs/source/en/basics/colossalai_booster.md create mode 100644 docs/source/zh-Hans/basics/colossalai_booster.md diff --git a/docs/source/en/basics/colossalai_booster.md b/docs/source/en/basics/colossalai_booster.md new file mode 100644 index 000000000000..fc33e8cbe039 --- /dev/null +++ b/docs/source/en/basics/colossalai_booster.md @@ -0,0 +1,124 @@ +# colossal-ai booster + +**Prerequisite:** +- [Distributed Training](../concepts/distributed_training.md) +- [Colossal-AI Overview](../concepts/colossalai_overview.md) + +## Introduction +In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, user can integrate their model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of. + +### Plugin +
Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:
+ +***GeminiPlugin:***This plugin wrapps the Gemini acceleration solution, that ZeRO with chunk-based memory management.
+ +***TorchDDPPlugin:***This plugin wrapps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines.
+ +***LowLevelZeroPlugin:***This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.
+ +### API of booster +Booster.__init__(...): +* Args: + * device (str or torch.device): The device to run the training. Default: 'cuda'. + * mixed_precision (str or MixedPrecision): The mixed precision to run the training. Default: None.If the argument is a string, it can be 'fp16', 'fp16_apex', 'bf16', or 'fp8'.'fp16' would use PyTorch AMP while 'fp16_apex' would use Nvidia Apex. + * plugin (Plugin): The plugin to run the training. Default: None. +* Return: + * booster (Booster) + + +booster.boost(...): This function is called to boost objects. (e.g. model, optimizer, criterion). +* Args: + * model (nn.Module): The model to be boosted. + * optimizer (Optimizer): The optimizer to be boosted. + * criterion (Callable): The criterion to be boosted. + * dataloader (DataLoader): The dataloader to be boosted. + * lr_scheduler (LRScheduler): The lr_scheduler to be boosted. +* Return: + * model, optimizer, criterion, dataloader, lr_scheduler + +booster.backward(loss, optimizer): This function run the backward operation +* Args: + * loss (torch.Tensor) + * optimizer (Optimizer) + +booster.no_sync(model) :A context manager to disable gradient synchronizations across processes. + +booster.save_model(...): This function is called to save model checkpoints +* Args: + * model: nn.Module, + * checkpoint: str, + * prefix: str = None, + * shard: bool = False, # if saved as shards + * size_per_shard: int = 1024 # the max length of shard + +booster.load_model(...): +* Args: + * model: nn.Module, + * checkpoint: str, + * strict: bool = True + +booster.save_optimizer(...): This function is called to save optimizer checkpoints +* Args: + * optimizer: Optimizer, + * checkpoint: str, + * shard: bool = False, # if saved as shards + * size_per_shard: int = 1024 # the max length of shard + +booster.load_optimizer(...): +* Args: + * optimizer: Optimizer, + * checkpoint: str, + +booster.save_lr_scheduler(...): This function is called to save lr scheduler checkpoints +* Args: + * lr_scheduler: LRScheduler, + * checkpoint: str, + +booster.load_lr_scheduler(...): +* Args: + * lr_scheduler: LRScheduler, + * checkpoint: str, + +## usage +In a typical workflow, you need to launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster API and these returned objects to continue the rest of your training processes. + +A pseudo-code example is like below:
+ +```python +import torch +from torch.optim import SGD +from torchvision.models import resnet18 + +import colossalai +from colossalai.booster import Booster +from colossalai.booster.plugin import TorchDDPPlugin + +def train(): + colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost') + plugin = TorchDDPPlugin() + booster = Booster(plugin=plugin) + model = resnet18() + criterion = lambda x: x.mean() + optimizer = SGD((model.parameters()), lr=0.001) + scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1) + model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler) + + x = torch.randn(4, 3, 224, 224) + x = x.to('cuda') + output = model(x) + loss = criterion(output) + booster.backward(loss, optimizer) + optimizer.clip_grad_by_norm(1.0) + optimizer.step() + scheduler.step() + + save_path = "./model" + booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors) + + new_model = resnet18() + booster.load_model(new_model, save_path) +``` + +if you want to run a example, [click here](../../../../examples/tutorial/new_api/cifar_resnet/README.md) + +[more design detailers](https://github.com/hpcaitech/ColossalAI/discussions/3046) diff --git a/docs/source/zh-Hans/basics/colossalai_booster.md b/docs/source/zh-Hans/basics/colossalai_booster.md new file mode 100644 index 000000000000..703fb484e3be --- /dev/null +++ b/docs/source/zh-Hans/basics/colossalai_booster.md @@ -0,0 +1,125 @@ +# booster 使用 + +**预备知识:** +- [分布式训练](../concepts/distributed_training.md) +- [Colossal-AI 总览](../concepts/colossalai_overview.md) + +## 简介 +在我们的新设计中, `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如,模型、优化器、数据加载器)无缝注入您的训练组件中。 使用booster API, 您可以更友好的将我们的并行策略整合到模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。 +在下面的章节中,我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。 + +### Plugin +Plugin是管理并行配置的重要组件(eg:gemini插件封装了gemini加速方案)。目前支持的插件如下:
+ +***GeminiPlugin:***GeminiPlugin插件封装了 gemini 加速解决方案,即具有基于块的内存管理的 ZeRO优化方案。
+ +***TorchDDPPlugin:***TorchDDPPlugin插件封装了DDP加速方案,实现了模块级别的数据并行,可以跨多机运行。
+ +***LowLevelZeroPlugin:***LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1:跨数据并行工作器/GPU 的分片优化器状态。阶段 2:分片优化器状态 + 跨数据并行工作者/GPU 的梯度。
+ +### API of booster +Booster.__init__(...): +* 参数: + * device (str or torch.device): 行训练的设备。默认值:'cuda'。 + * mixed_precision (str or MixedPrecision): 运行训练的混合精度。默认值:None。如果参数是字符串,则它可以是“fp16”、“fp16_apex”、“bf16”或“fp8”。“fp16”将使用 PyTorch AMP,而“fp16_apex”将使用 Nvidia Apex。 + * plugin (Plugin): 运行训练的插件。默认值:None。 + * booster (Booster) + + +booster.boost(...): 调用此函数来注入特性到对象中。 (例如模型、优化器、标准) +* 参数: + * model (nn.Module): 被注入的模型对象。 + * optimizer (Optimizer): 被注入的优化器对象。 + * criterion (Callable): 被注入的criterion对象。 + * dataloader (DataLoader): 被注入的dataloader对象. + * lr_scheduler (LRScheduler): 被注入的lr_scheduler对象. +* 返回值: + * model, optimizer, criterion, dataloader, lr_scheduler + +booster.backward(loss, optimizer): 调用该函数执行反向传播操作。 +* 参数: + * loss (torch.Tensor) + * optimizer (Optimizer) + +booster.no_sync(model) :返回一个上下文管理器,用于禁用跨进程的梯度同步。 + +booster.save_model(...): 调用此函数以保存模型。 +* 参数: + * model: nn.Module, + * checkpoint: str, + * prefix: str = None, + * shard: bool = False, # if saved as shards + * size_per_shard: int = 1024 # the max length of shard + +booster.load_model(...): 调用该函数加载模型。 +* 参数: + * model: nn.Module, + * checkpoint: str, + * strict: bool = True + +booster.save_optimizer(...): 调用此函数以保存优化器。 +* 参数: + * optimizer: Optimizer, + * checkpoint: str, + * shard: bool = False, # if saved as shards + * size_per_shard: int = 1024 # the max length of shard + +booster.load_optimizer(...): 调用此函数以加载优化器。 +* 参数: + * optimizer: Optimizer, + * checkpoint: str, + +booster.save_lr_scheduler(...): 调用此函数以保存学习率更新器。 +* 参数: + * lr_scheduler: LRScheduler, + * checkpoint: str, + +booster.load_lr_scheduler(...): 调用此函数以加载学习率更新器。 +* 参数: + * lr_scheduler: LRScheduler, + * checkpoint: str, + +## usage + +在使用colossalai训练时,首先需要在训练脚本的开头启动分布式环境,并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后,调用`colossalai.booster` 将特征注入到这些对象中,您就可以使用我们的booster API去进行您接下来的训练流程。 + +以下是一个伪代码示例,将展示如何使用我们的booster API进行模型训练:
+ +```python +import torch +from torch.optim import SGD +from torchvision.models import resnet18 + +import colossalai +from colossalai.booster import Booster +from colossalai.booster.plugin import TorchDDPPlugin + +def train(): + colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost') + plugin = TorchDDPPlugin() + booster = Booster(plugin=plugin) + model = resnet18() + criterion = lambda x: x.mean() + optimizer = SGD((model.parameters()), lr=0.001) + scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1) + model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler) + + x = torch.randn(4, 3, 224, 224) + x = x.to('cuda') + output = model(x) + loss = criterion(output) + booster.backward(loss, optimizer) + optimizer.clip_grad_by_norm(1.0) + optimizer.step() + scheduler.step() + + save_path = "./model" + booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors) + + new_model = resnet18() + booster.load_model(new_model, save_path) +``` + +如果您想运行一个可执行的例子, [请点击](../../../../examples/tutorial/new_api/cifar_resnet/README.md) + +[更多的设计细节请参考](https://github.com/hpcaitech/ColossalAI/discussions/3046) From 2a2e889a6f169f5bc73e0c6bd4a34f8e7818186d Mon Sep 17 00:00:00 2001 From: Mingyan Jiang <1829166702@qq.com> Date: Tue, 9 May 2023 18:05:42 +0800 Subject: [PATCH 02/30] [booster] update booster tutorials#3717, fix --- docs/source/zh-Hans/features/1D_tensor_parallel.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/zh-Hans/features/1D_tensor_parallel.md b/docs/source/zh-Hans/features/1D_tensor_parallel.md index 2ddc27c7b50f..74954dac8f48 100644 --- a/docs/source/zh-Hans/features/1D_tensor_parallel.md +++ b/docs/source/zh-Hans/features/1D_tensor_parallel.md @@ -23,7 +23,7 @@ ```math \left[\begin{matrix} B_1 \\ B_2 \end{matrix} \right] ``` -这就是所谓的行并行方式.Plugin是管理并行配置的重要组件(eg:gemini插件封装了gemini加速方案)。目前支持的插件如下:
+ +***GeminiPlugin:***GeminiPlugin插件封装了 gemini 加速解决方案,即具有基于块的内存管理的 ZeRO优化方案。
+ +***TorchDDPPlugin:***TorchDDPPlugin插件封装了DDP加速方案,实现了模块级别的数据并行,可以跨多机运行。
+ +***LowLevelZeroPlugin:***LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1:跨数据并行工作器/GPU 的分片优化器状态。阶段 2:分片优化器状态 + 跨数据并行工作者/GPU 的梯度。
+ +### API of booster +Booster.__init__(...): +* 参数: + * device (str or torch.device): 行训练的设备。默认值:'cuda'。 + * mixed_precision (str or MixedPrecision): 运行训练的混合精度。默认值:None。如果参数是字符串,则它可以是“fp16”、“fp16_apex”、“bf16”或“fp8”。“fp16”将使用 PyTorch AMP,而“fp16_apex”将使用 Nvidia Apex。 + * plugin (Plugin): 运行训练的插件。默认值:None。 + * booster (Booster) + + +booster.boost(...): 调用此函数来注入特性到对象中。 (例如模型、优化器、标准) +* 参数: + * model (nn.Module): 被注入的模型对象。 + * optimizer (Optimizer): 被注入的优化器对象。 + * criterion (Callable): 被注入的criterion对象。 + * dataloader (DataLoader): 被注入的dataloader对象. + * lr_scheduler (LRScheduler): 被注入的lr_scheduler对象. +* 返回值: + * model, optimizer, criterion, dataloader, lr_scheduler + +booster.backward(loss, optimizer): 调用该函数执行反向传播操作。 +* 参数: + * loss (torch.Tensor) + * optimizer (Optimizer) + +booster.no_sync(model) :返回一个上下文管理器,用于禁用跨进程的梯度同步。 + +booster.save_model(...): 调用此函数以保存模型。 +* 参数: + * model: nn.Module, + * checkpoint: str, + * prefix: str = None, + * shard: bool = False, # if saved as shards + * size_per_shard: int = 1024 # the max length of shard + +booster.load_model(...): 调用该函数加载模型。 +* 参数: + * model: nn.Module, + * checkpoint: str, + * strict: bool = True + +booster.save_optimizer(...): 调用此函数以保存优化器。 +* 参数: + * optimizer: Optimizer, + * checkpoint: str, + * shard: bool = False, # if saved as shards + * size_per_shard: int = 1024 # the max length of shard + +booster.load_optimizer(...): 调用此函数以加载优化器。 +* 参数: + * optimizer: Optimizer, + * checkpoint: str, + +booster.save_lr_scheduler(...): 调用此函数以保存学习率更新器。 +* 参数: + * lr_scheduler: LRScheduler, + * checkpoint: str, + +booster.load_lr_scheduler(...): 调用此函数以加载学习率更新器。 +* 参数: + * lr_scheduler: LRScheduler, + * checkpoint: str, + +## usage + +在使用colossalai训练时,首先需要在训练脚本的开头启动分布式环境,并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后,调用`colossalai.booster` 将特征注入到这些对象中,您就可以使用我们的booster API去进行您接下来的训练流程。 + +以下是一个伪代码示例,将展示如何使用我们的booster API进行模型训练:
+ +```python +import torch +from torch.optim import SGD +from torchvision.models import resnet18 + +import colossalai +from colossalai.booster import Booster +from colossalai.booster.plugin import TorchDDPPlugin + +def train(): + colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost') + plugin = TorchDDPPlugin() + booster = Booster(plugin=plugin) + model = resnet18() + criterion = lambda x: x.mean() + optimizer = SGD((model.parameters()), lr=0.001) + scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1) + model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler) + + x = torch.randn(4, 3, 224, 224) + x = x.to('cuda') + output = model(x) + loss = criterion(output) + booster.backward(loss, optimizer) + optimizer.clip_grad_by_norm(1.0) + optimizer.step() + scheduler.step() + + save_path = "./model" + booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors) + + new_model = resnet18() + booster.load_model(new_model, save_path) +``` + +如果您想运行一个可执行的例子, [请点击](../../../../examples/tutorial/new_api/cifar_resnet/README.md) + +[更多的设计细节请参考](https://github.com/hpcaitech/ColossalAI/discussions/3046) From c3d44adfdf84936ca6f4b82fc0846891eba222d5 Mon Sep 17 00:00:00 2001 From: Mingyan Jiang <1829166702@qq.com> Date: Wed, 17 May 2023 13:38:58 +0800 Subject: [PATCH 07/30] [booster] update booster tutorials#3717, update setup doc --- docs/source/zh-Hans/basics/booster_api.md | 92 +++++++---------------- 1 file changed, 27 insertions(+), 65 deletions(-) diff --git a/docs/source/zh-Hans/basics/booster_api.md b/docs/source/zh-Hans/basics/booster_api.md index 703fb484e3be..47903426f679 100644 --- a/docs/source/zh-Hans/basics/booster_api.md +++ b/docs/source/zh-Hans/basics/booster_api.md @@ -9,81 +9,41 @@ 在下面的章节中,我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。 ### Plugin -Plugin是管理并行配置的重要组件(eg:gemini插件封装了gemini加速方案)。目前支持的插件如下:
+Plugin是管理并行配置的重要组件(eg:gemini插件封装了gemini加速方案)。目前支持的插件如下: -***GeminiPlugin:***GeminiPlugin插件封装了 gemini 加速解决方案,即具有基于块的内存管理的 ZeRO优化方案。
+***GeminiPlugin:*** GeminiPlugin插件封装了 gemini 加速解决方案,即具有基于块的内存管理的 ZeRO优化方案。 -***TorchDDPPlugin:***TorchDDPPlugin插件封装了DDP加速方案,实现了模块级别的数据并行,可以跨多机运行。
+***TorchDDPPlugin:*** TorchDDPPlugin插件封装了DDP加速方案,实现了模块级别的数据并行,可以跨多机运行。 -***LowLevelZeroPlugin:***LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1:跨数据并行工作器/GPU 的分片优化器状态。阶段 2:分片优化器状态 + 跨数据并行工作者/GPU 的梯度。
+***LowLevelZeroPlugin:*** LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1:跨数据并行工作器/GPU 的分片优化器状态。阶段 2:分片优化器状态 + 跨数据并行工作者/GPU 的梯度。 ### API of booster -Booster.__init__(...): -* 参数: - * device (str or torch.device): 行训练的设备。默认值:'cuda'。 - * mixed_precision (str or MixedPrecision): 运行训练的混合精度。默认值:None。如果参数是字符串,则它可以是“fp16”、“fp16_apex”、“bf16”或“fp8”。“fp16”将使用 PyTorch AMP,而“fp16_apex”将使用 Nvidia Apex。 - * plugin (Plugin): 运行训练的插件。默认值:None。 - * booster (Booster) - - -booster.boost(...): 调用此函数来注入特性到对象中。 (例如模型、优化器、标准) -* 参数: - * model (nn.Module): 被注入的模型对象。 - * optimizer (Optimizer): 被注入的优化器对象。 - * criterion (Callable): 被注入的criterion对象。 - * dataloader (DataLoader): 被注入的dataloader对象. - * lr_scheduler (LRScheduler): 被注入的lr_scheduler对象. -* 返回值: - * model, optimizer, criterion, dataloader, lr_scheduler - -booster.backward(loss, optimizer): 调用该函数执行反向传播操作。 -* 参数: - * loss (torch.Tensor) - * optimizer (Optimizer) - -booster.no_sync(model) :返回一个上下文管理器,用于禁用跨进程的梯度同步。 - -booster.save_model(...): 调用此函数以保存模型。 -* 参数: - * model: nn.Module, - * checkpoint: str, - * prefix: str = None, - * shard: bool = False, # if saved as shards - * size_per_shard: int = 1024 # the max length of shard - -booster.load_model(...): 调用该函数加载模型。 -* 参数: - * model: nn.Module, - * checkpoint: str, - * strict: bool = True - -booster.save_optimizer(...): 调用此函数以保存优化器。 -* 参数: - * optimizer: Optimizer, - * checkpoint: str, - * shard: bool = False, # if saved as shards - * size_per_shard: int = 1024 # the max length of shard - -booster.load_optimizer(...): 调用此函数以加载优化器。 -* 参数: - * optimizer: Optimizer, - * checkpoint: str, - -booster.save_lr_scheduler(...): 调用此函数以保存学习率更新器。 -* 参数: - * lr_scheduler: LRScheduler, - * checkpoint: str, - -booster.load_lr_scheduler(...): 调用此函数以加载学习率更新器。 -* 参数: - * lr_scheduler: LRScheduler, - * checkpoint: str, + +{{ autodoc:colossalai.booster.Booster.__init__ }} + +{{ autodoc:colossalai.booster.Booster.boost }} + +{{ autodoc:colossalai.booster.Booster.backward }} + +{{ autodoc:colossalai.booster.Booster.no_sync }} + +{{ autodoc:colossalai.booster.Booster.save_model }} + +{{ autodoc:colossalai.booster.Booster.load_model }} + +{{ autodoc:colossalai.booster.Booster.save_optimizer }} + +{{ autodoc:colossalai.booster.Booster.load_optimizer }} + +{{ autodoc:colossalai.booster.Booster.save_lr_scheduler }} + +{{ autodoc:colossalai.booster.Booster.load_lr_scheduler }} ## usage 在使用colossalai训练时,首先需要在训练脚本的开头启动分布式环境,并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后,调用`colossalai.booster` 将特征注入到这些对象中,您就可以使用我们的booster API去进行您接下来的训练流程。 -以下是一个伪代码示例,将展示如何使用我们的booster API进行模型训练:
+以下是一个伪代码示例,将展示如何使用我们的booster API进行模型训练: ```python import torch @@ -123,3 +83,5 @@ def train(): 如果您想运行一个可执行的例子, [请点击](../../../../examples/tutorial/new_api/cifar_resnet/README.md) [更多的设计细节请参考](https://github.com/hpcaitech/ColossalAI/discussions/3046) + + From e8d7b9468006b1ebacd1593c64fcf47351587f2f Mon Sep 17 00:00:00 2001 From: Mingyan Jiang <1829166702@qq.com> Date: Wed, 17 May 2023 13:40:12 +0800 Subject: [PATCH 08/30] [booster] update booster tutorials#3717, update setup doc --- docs/sidebars.json | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/sidebars.json b/docs/sidebars.json index 44287c17eadf..2732704a5cab 100644 --- a/docs/sidebars.json +++ b/docs/sidebars.json @@ -32,7 +32,8 @@ "basics/engine_trainer", "basics/configure_parallelization", "basics/model_checkpoint", - "basics/colotensor_concept" + "basics/colotensor_concept", + "basics/booster_api" ] }, { From 68e84be98342ec03c9c5d74318b512604e805489 Mon Sep 17 00:00:00 2001 From: Mingyan Jiang <1829166702@qq.com> Date: Wed, 17 May 2023 13:43:53 +0800 Subject: [PATCH 09/30] [booster] update booster tutorials#3717, rename colossalai booster.md --- docs/source/en/basics/colossalai_booster.md | 124 ----------------- .../zh-Hans/basics/colossalai_booster.md | 125 ------------------ 2 files changed, 249 deletions(-) delete mode 100644 docs/source/en/basics/colossalai_booster.md delete mode 100644 docs/source/zh-Hans/basics/colossalai_booster.md diff --git a/docs/source/en/basics/colossalai_booster.md b/docs/source/en/basics/colossalai_booster.md deleted file mode 100644 index fc33e8cbe039..000000000000 --- a/docs/source/en/basics/colossalai_booster.md +++ /dev/null @@ -1,124 +0,0 @@ -# colossal-ai booster - -**Prerequisite:** -- [Distributed Training](../concepts/distributed_training.md) -- [Colossal-AI Overview](../concepts/colossalai_overview.md) - -## Introduction -In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, user can integrate their model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of. - -### Plugin -Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:
- -***GeminiPlugin:***This plugin wrapps the Gemini acceleration solution, that ZeRO with chunk-based memory management.
- -***TorchDDPPlugin:***This plugin wrapps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines.
- -***LowLevelZeroPlugin:***This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.
- -### API of booster -Booster.__init__(...): -* Args: - * device (str or torch.device): The device to run the training. Default: 'cuda'. - * mixed_precision (str or MixedPrecision): The mixed precision to run the training. Default: None.If the argument is a string, it can be 'fp16', 'fp16_apex', 'bf16', or 'fp8'.'fp16' would use PyTorch AMP while 'fp16_apex' would use Nvidia Apex. - * plugin (Plugin): The plugin to run the training. Default: None. -* Return: - * booster (Booster) - - -booster.boost(...): This function is called to boost objects. (e.g. model, optimizer, criterion). -* Args: - * model (nn.Module): The model to be boosted. - * optimizer (Optimizer): The optimizer to be boosted. - * criterion (Callable): The criterion to be boosted. - * dataloader (DataLoader): The dataloader to be boosted. - * lr_scheduler (LRScheduler): The lr_scheduler to be boosted. -* Return: - * model, optimizer, criterion, dataloader, lr_scheduler - -booster.backward(loss, optimizer): This function run the backward operation -* Args: - * loss (torch.Tensor) - * optimizer (Optimizer) - -booster.no_sync(model) :A context manager to disable gradient synchronizations across processes. - -booster.save_model(...): This function is called to save model checkpoints -* Args: - * model: nn.Module, - * checkpoint: str, - * prefix: str = None, - * shard: bool = False, # if saved as shards - * size_per_shard: int = 1024 # the max length of shard - -booster.load_model(...): -* Args: - * model: nn.Module, - * checkpoint: str, - * strict: bool = True - -booster.save_optimizer(...): This function is called to save optimizer checkpoints -* Args: - * optimizer: Optimizer, - * checkpoint: str, - * shard: bool = False, # if saved as shards - * size_per_shard: int = 1024 # the max length of shard - -booster.load_optimizer(...): -* Args: - * optimizer: Optimizer, - * checkpoint: str, - -booster.save_lr_scheduler(...): This function is called to save lr scheduler checkpoints -* Args: - * lr_scheduler: LRScheduler, - * checkpoint: str, - -booster.load_lr_scheduler(...): -* Args: - * lr_scheduler: LRScheduler, - * checkpoint: str, - -## usage -In a typical workflow, you need to launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster API and these returned objects to continue the rest of your training processes. - -A pseudo-code example is like below:
- -```python -import torch -from torch.optim import SGD -from torchvision.models import resnet18 - -import colossalai -from colossalai.booster import Booster -from colossalai.booster.plugin import TorchDDPPlugin - -def train(): - colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost') - plugin = TorchDDPPlugin() - booster = Booster(plugin=plugin) - model = resnet18() - criterion = lambda x: x.mean() - optimizer = SGD((model.parameters()), lr=0.001) - scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1) - model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler) - - x = torch.randn(4, 3, 224, 224) - x = x.to('cuda') - output = model(x) - loss = criterion(output) - booster.backward(loss, optimizer) - optimizer.clip_grad_by_norm(1.0) - optimizer.step() - scheduler.step() - - save_path = "./model" - booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors) - - new_model = resnet18() - booster.load_model(new_model, save_path) -``` - -if you want to run a example, [click here](../../../../examples/tutorial/new_api/cifar_resnet/README.md) - -[more design detailers](https://github.com/hpcaitech/ColossalAI/discussions/3046) diff --git a/docs/source/zh-Hans/basics/colossalai_booster.md b/docs/source/zh-Hans/basics/colossalai_booster.md deleted file mode 100644 index 703fb484e3be..000000000000 --- a/docs/source/zh-Hans/basics/colossalai_booster.md +++ /dev/null @@ -1,125 +0,0 @@ -# booster 使用 - -**预备知识:** -- [分布式训练](../concepts/distributed_training.md) -- [Colossal-AI 总览](../concepts/colossalai_overview.md) - -## 简介 -在我们的新设计中, `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如,模型、优化器、数据加载器)无缝注入您的训练组件中。 使用booster API, 您可以更友好的将我们的并行策略整合到模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。 -在下面的章节中,我们将介绍 `colossalai.booster` 是如何工作的以及使用中我们要注意的细节。 - -### Plugin -Plugin是管理并行配置的重要组件(eg:gemini插件封装了gemini加速方案)。目前支持的插件如下:
- -***GeminiPlugin:***GeminiPlugin插件封装了 gemini 加速解决方案,即具有基于块的内存管理的 ZeRO优化方案。
- -***TorchDDPPlugin:***TorchDDPPlugin插件封装了DDP加速方案,实现了模块级别的数据并行,可以跨多机运行。
- -***LowLevelZeroPlugin:***LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1:跨数据并行工作器/GPU 的分片优化器状态。阶段 2:分片优化器状态 + 跨数据并行工作者/GPU 的梯度。
- -### API of booster -Booster.__init__(...): -* 参数: - * device (str or torch.device): 行训练的设备。默认值:'cuda'。 - * mixed_precision (str or MixedPrecision): 运行训练的混合精度。默认值:None。如果参数是字符串,则它可以是“fp16”、“fp16_apex”、“bf16”或“fp8”。“fp16”将使用 PyTorch AMP,而“fp16_apex”将使用 Nvidia Apex。 - * plugin (Plugin): 运行训练的插件。默认值:None。 - * booster (Booster) - - -booster.boost(...): 调用此函数来注入特性到对象中。 (例如模型、优化器、标准) -* 参数: - * model (nn.Module): 被注入的模型对象。 - * optimizer (Optimizer): 被注入的优化器对象。 - * criterion (Callable): 被注入的criterion对象。 - * dataloader (DataLoader): 被注入的dataloader对象. - * lr_scheduler (LRScheduler): 被注入的lr_scheduler对象. -* 返回值: - * model, optimizer, criterion, dataloader, lr_scheduler - -booster.backward(loss, optimizer): 调用该函数执行反向传播操作。 -* 参数: - * loss (torch.Tensor) - * optimizer (Optimizer) - -booster.no_sync(model) :返回一个上下文管理器,用于禁用跨进程的梯度同步。 - -booster.save_model(...): 调用此函数以保存模型。 -* 参数: - * model: nn.Module, - * checkpoint: str, - * prefix: str = None, - * shard: bool = False, # if saved as shards - * size_per_shard: int = 1024 # the max length of shard - -booster.load_model(...): 调用该函数加载模型。 -* 参数: - * model: nn.Module, - * checkpoint: str, - * strict: bool = True - -booster.save_optimizer(...): 调用此函数以保存优化器。 -* 参数: - * optimizer: Optimizer, - * checkpoint: str, - * shard: bool = False, # if saved as shards - * size_per_shard: int = 1024 # the max length of shard - -booster.load_optimizer(...): 调用此函数以加载优化器。 -* 参数: - * optimizer: Optimizer, - * checkpoint: str, - -booster.save_lr_scheduler(...): 调用此函数以保存学习率更新器。 -* 参数: - * lr_scheduler: LRScheduler, - * checkpoint: str, - -booster.load_lr_scheduler(...): 调用此函数以加载学习率更新器。 -* 参数: - * lr_scheduler: LRScheduler, - * checkpoint: str, - -## usage - -在使用colossalai训练时,首先需要在训练脚本的开头启动分布式环境,并创建需要使用的模型、优化器、损失函数、数据加载器等对象等。之后,调用`colossalai.booster` 将特征注入到这些对象中,您就可以使用我们的booster API去进行您接下来的训练流程。 - -以下是一个伪代码示例,将展示如何使用我们的booster API进行模型训练:
- -```python -import torch -from torch.optim import SGD -from torchvision.models import resnet18 - -import colossalai -from colossalai.booster import Booster -from colossalai.booster.plugin import TorchDDPPlugin - -def train(): - colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost') - plugin = TorchDDPPlugin() - booster = Booster(plugin=plugin) - model = resnet18() - criterion = lambda x: x.mean() - optimizer = SGD((model.parameters()), lr=0.001) - scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1) - model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler) - - x = torch.randn(4, 3, 224, 224) - x = x.to('cuda') - output = model(x) - loss = criterion(output) - booster.backward(loss, optimizer) - optimizer.clip_grad_by_norm(1.0) - optimizer.step() - scheduler.step() - - save_path = "./model" - booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors) - - new_model = resnet18() - booster.load_model(new_model, save_path) -``` - -如果您想运行一个可执行的例子, [请点击](../../../../examples/tutorial/new_api/cifar_resnet/README.md) - -[更多的设计细节请参考](https://github.com/hpcaitech/ColossalAI/discussions/3046) From 6052a5d1cd28cf61541592383ef4e82cf5b739a2 Mon Sep 17 00:00:00 2001 From: Mingyan Jiang <1829166702@qq.com> Date: Wed, 17 May 2023 13:45:37 +0800 Subject: [PATCH 10/30] [booster] update booster tutorials#3717, rename colossalai booster.md --- docs/source/zh-Hans/basics/launch_colossalai.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/zh-Hans/basics/launch_colossalai.md b/docs/source/zh-Hans/basics/launch_colossalai.md index 54fe7221dc7a..39b09deae085 100644 --- a/docs/source/zh-Hans/basics/launch_colossalai.md +++ b/docs/source/zh-Hans/basics/launch_colossalai.md @@ -74,7 +74,7 @@ import colossalai args = colossalai.get_default_parser().parse_args() # launch distributed environment -colossalai.launch(config=