diff --git a/README.md b/README.md
index 8342a9fa0c9e..79f733122cb3 100644
--- a/README.md
+++ b/README.md
@@ -38,6 +38,14 @@
- Why Colossal-AI
- Features
+ -
+ Colossal-AI for Real World Applications
+
+
-
Parallel Training Demo
-
- -
- Colossal-AI for Real World Applications
-
-
Installation
@@ -120,6 +120,88 @@ distributed training and inference in a few lines.
- Inference
- [Energon-AI](https://github.com/hpcaitech/EnergonAI)
+
(back to top)
+
+## Colossal-AI in the Real World
+
+### ColossalChat
+
+
+
+[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. [[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) [[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) [[demo]](https://chat.colossalai.org)
+
+
+
+
+
+- Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference
+
+
+
+
+
+- Up to 10.3x growth in model capacity on one GPU
+- A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)
+
+
+
+
+
+- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
+- Keep at a sufficiently high running speed
+
+(back to top)
+
+
+### AIGC
+Acceleration of AIGC (AI-Generated Content) models such as [Stable Diffusion v1](https://github.com/CompVis/stable-diffusion) and [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion).
+
+
+
+
+- [Training](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce Stable Diffusion memory consumption by up to 5.6x and hardware cost by up to 46x (from A100 to RTX3060).
+
+
+
+
+
+- [DreamBooth Fine-tuning](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth): Personalize your model using just 3-5 images of the desired subject.
+
+
+
+
+
+- [Inference](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce inference GPU memory consumption by 2.5x.
+
+
+(back to top)
+
+### Biomedicine
+Acceleration of [AlphaFold Protein Structure](https://alphafold.ebi.ac.uk/)
+
+
+
+
+
+- [FastFold](https://github.com/hpcaitech/FastFold): Accelerating training and inference on GPU Clusters, faster data processing, inference sequence containing more than 10000 residues.
+
+
+
+
+
+- [FastFold with Intel](https://github.com/hpcaitech/FastFold): 3x inference acceleration and 39% cost reduce.
+
+
+
+
+
+- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): accelerating structure prediction of protein monomers and multimer by 11x.
+
+
(back to top)
## Parallel Training Demo
@@ -213,88 +295,6 @@ Please visit our [documentation](https://www.colossalai.org/) and [examples](htt
- [BLOOM](https://github.com/hpcaitech/EnergonAI/tree/main/examples/bloom): Reduce hardware deployment costs of 176-billion-parameter BLOOM by more than 10 times.
-(back to top)
-
-## Colossal-AI in the Real World
-
-### ColossalChat
-
-
-
-[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. [[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) [[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) [[demo]](https://chat.colossalai.org)
-
-
-
-
-
-- Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference
-
-
-
-
-
-- Up to 10.3x growth in model capacity on one GPU
-- A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)
-
-
-
-
-
-- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
-- Keep at a sufficiently high running speed
-
-(back to top)
-
-
-### AIGC
-Acceleration of AIGC (AI-Generated Content) models such as [Stable Diffusion v1](https://github.com/CompVis/stable-diffusion) and [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion).
-
-
-
-
-- [Training](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce Stable Diffusion memory consumption by up to 5.6x and hardware cost by up to 46x (from A100 to RTX3060).
-
-
-
-
-
-- [DreamBooth Fine-tuning](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth): Personalize your model using just 3-5 images of the desired subject.
-
-
-
-
-
-- [Inference](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce inference GPU memory consumption by 2.5x.
-
-
-(back to top)
-
-### Biomedicine
-Acceleration of [AlphaFold Protein Structure](https://alphafold.ebi.ac.uk/)
-
-
-
-
-
-- [FastFold](https://github.com/hpcaitech/FastFold): Accelerating training and inference on GPU Clusters, faster data processing, inference sequence containing more than 10000 residues.
-
-
-
-
-
-- [FastFold with Intel](https://github.com/hpcaitech/FastFold): 3x inference acceleration and 39% cost reduce.
-
-
-
-
-
-- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): accelerating structure prediction of protein monomers and multimer by 11x.
-
-
(back to top)
## Installation
@@ -303,6 +303,8 @@ Requirements:
- PyTorch >= 1.11 (PyTorch 2.x in progress)
- Python >= 3.7
- CUDA >= 11.0
+- [NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0 (V100/RTX20 and higher)
+- Linux OS
If you encounter any problem with installation, you may want to raise an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose) in this repository.
diff --git a/applications/Chat/examples/community/README.md b/applications/Chat/examples/community/README.md
index 5e3f47db37b3..905418892611 100644
--- a/applications/Chat/examples/community/README.md
+++ b/applications/Chat/examples/community/README.md
@@ -1 +1,23 @@
# Community Examples
+---
+We are thrilled to announce the latest updates to ColossalChat, an open-source solution for cloning ChatGPT with a complete RLHF (Reinforcement Learning with Human Feedback) pipeline.
+
+As Colossal-AI undergoes major updates, we are actively maintaining ColossalChat to stay aligned with the project's progress. With the introduction of Community-driven example, we aim to create a collaborative platform for developers to contribute exotic features built on top of ColossalChat.
+
+## Community Example
+
+Community-driven Examples is an initiative that allows users to contribute their own examples to the ColossalChat package, fostering a sense of community and making it easy for others to access and benefit from shared work. The primary goal with community-driven examples is to have a community-maintained collection of diverse and exotic functionalities built on top of the ColossalChat package, which is powered by the Colossal-AI project and its Coati module (ColossalAI Talking Intelligence).
+
+For more information about community pipelines, please have a look at this [issue](https://github.com/hpcaitech/ColossalAI/issues/3487).
+
+## Community Examples
+
+Community examples consist of both inference and training examples that have been added by the community. Please have a look at the following table to get an overview of all community examples. Click on the Code Example to get a copy-and-paste ready code example that you can try out. If a community doesn't work as expected, please open an issue and ping the author on it.
+
+| Example | Description | Code Example | Colab | Author |
+|:---------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------:|
+| Peft | Adding Peft support for SFT and Prompts model training | [Huggingface Peft](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/community/peft) | - | [YY Lin](https://github.com/yynil) |
+|...|...|...|...|...|
+
+### How to get involved
+To join our community-driven initiative, please visit the [ColossalChat GitHub repository](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples), review the provided information, and explore the codebase. To contribute, create a new issue outlining your proposed feature or enhancement, and our team will review and provide feedback. We look forward to collaborating with you on this exciting project!
diff --git a/applications/Chat/examples/train_sft.py b/applications/Chat/examples/train_sft.py
index c0ac7b177694..22f70e485843 100644
--- a/applications/Chat/examples/train_sft.py
+++ b/applications/Chat/examples/train_sft.py
@@ -35,6 +35,8 @@ def train(args):
strategy = ColossalAIStrategy(stage=3, placement_policy='cuda')
elif args.strategy == 'colossalai_zero2':
strategy = ColossalAIStrategy(stage=2, placement_policy='cuda')
+ elif args.strategy == 'colossalai_zero2_cpu':
+ strategy = ColossalAIStrategy(stage=2, placement_policy='cpu')
else:
raise ValueError(f'Unsupported strategy "{args.strategy}"')
@@ -168,7 +170,7 @@ def train(args):
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--strategy',
- choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2'],
+ choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2', 'colossalai_zero2_cpu'],
default='naive')
parser.add_argument('--model', choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom')
parser.add_argument('--pretrain', type=str, default=None)
diff --git a/applications/README.md b/applications/README.md
index b6bde313e54d..cd0435aae199 100644
--- a/applications/README.md
+++ b/applications/README.md
@@ -5,8 +5,8 @@ This directory contains the applications that are powered by Colossal-AI.
The list of applications include:
- [X] [Chatbot](./Chat/README.md)
-- [ ] Stable Diffusion
-- [ ] Dreambooth
-
+- [X] [FastFold](https://github.com/hpcaitech/FastFold): Optimizing AlphaFold (Biomedicine) Training and Inference on GPU Clusters
> Please note that the `Chatbot` application is migrated from the original `ChatGPT` folder.
+
+You can find more example code for base models and functions in the [Examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples) directory.
diff --git a/colossalai/zero/gemini/gemini_optimizer.py b/colossalai/zero/gemini/gemini_optimizer.py
index 8e0237ddc7bc..8940ab9a3251 100644
--- a/colossalai/zero/gemini/gemini_optimizer.py
+++ b/colossalai/zero/gemini/gemini_optimizer.py
@@ -46,12 +46,15 @@ class ZeroOptimizer(ColossalaiOptimizer):
Defaults to 0.0.
initial_scale (float, optional): Initial scale used by DynamicGradScaler. Defaults to 2**32.
min_scale (float, optional): Min scale used by DynamicGradScaler. Defaults to 1.
- growth_factor (float, optional): growth_factor used by DynamicGradScaler. Defaults to 2.
- backoff_factor (float, optional): backoff_factor used by DynamicGradScaler. Defaults to 0.5.
- growth_interval (float, optional): growth_interval used by DynamicGradScaler. Defaults to 1000.
- hysteresis (float, optional): hysteresis used by DynamicGradScaler. Defaults to 2.
- max_scale (int, optional): max_scale used by DynamicGradScaler. Defaults to 2**32.
- """
+ growth_factor (float, optional): Growth_factor used by DynamicGradScaler. Defaults to 2.
+ backoff_factor (float, optional): Backoff_factor used by DynamicGradScaler. Defaults to 0.5.
+ growth_interval (float, optional): Growth_interval used by DynamicGradScaler. Defaults to 1000.
+ hysteresis (float, optional): Hysteresis used by DynamicGradScaler. Defaults to 2.
+ max_scale (int, optional): Max_scale used by DynamicGradScaler. Defaults to 2**32.
+ clipping_norm (float, optional): The norm value used to clip gradient. Defaults to 0.0.
+ norm_type (float, optional): The type of norm used for gradient clipping. Currently, only L2-norm (norm_type=2.0)
+ is supported in ZeroOptimizer. Defaults to 2.0.
+ """
def __init__(self,
optim: Optimizer,
diff --git a/docs/README-zh-Hans.md b/docs/README-zh-Hans.md
index f43a5953022d..daa42412cc3a 100644
--- a/docs/README-zh-Hans.md
+++ b/docs/README-zh-Hans.md
@@ -38,6 +38,14 @@
- 为何选择 Colossal-AI
- 特点
+ -
+ Colossal-AI 成功案例
+
+
-
并行训练样例展示
--
- Colossal-AI 成功案例
-
-
-
安装
@@ -117,8 +117,88 @@ Colossal-AI 为您提供了一系列并行组件。我们的目标是让您的
(返回顶端)
-## 并行训练样例展示
+## Colossal-AI 成功案例
+### ColossalChat
+
+
+
+[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): 完整RLHF流程0门槛克隆 [ChatGPT](https://openai.com/blog/chatgpt/) [[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) [[博客]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) [[在线样例]](https://chat.colossalai.org)
+
+
+
+
+
+- 最高可提升单机训练速度7.73倍,单卡推理速度1.42倍
+
+
+
+
+
+- 单卡模型容量最多提升10.3倍
+- 最小demo训练流程最低仅需1.62GB显存 (任意消费级GPU)
+
+
+
+
+
+- 提升单卡的微调模型容量3.7倍
+- 同时保持高速运行
+
+(back to top)
+
+### AIGC
+加速AIGC(AI内容生成)模型,如[Stable Diffusion v1](https://github.com/CompVis/stable-diffusion) 和 [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion)
+
+
+
+
+- [训练](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): 减少5.6倍显存消耗,硬件成本最高降低46倍(从A100到RTX3060)
+
+
+
+
+
+- [DreamBooth微调](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth): 仅需3-5张目标主题图像个性化微调
+
+
+
+
+
+- [推理](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): GPU推理显存消耗降低2.5倍
+
+
+(返回顶端)
+
+### 生物医药
+
+加速 [AlphaFold](https://alphafold.ebi.ac.uk/) 蛋白质结构预测
+
+
+
+
+
+- [FastFold](https://github.com/hpcaitech/FastFold): 加速AlphaFold训练与推理、数据前处理、推理序列长度超过10000残基
+
+
+
+
+
+- [FastFold with Intel](https://github.com/hpcaitech/FastFold): 3倍推理加速和39%成本节省
+
+
+
+
+
+- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): 11倍加速蛋白质单体与复合物结构预测
+
+(返回顶端)
+
+## 并行训练样例展示
### GPT-3
@@ -213,87 +293,6 @@ Colossal-AI 为您提供了一系列并行组件。我们的目标是让您的
(返回顶端)
-## Colossal-AI 成功案例
-### ColossalChat
-
-
-
-[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): 完整RLHF流程0门槛克隆 [ChatGPT](https://openai.com/blog/chatgpt/) [[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) [[博客]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) [[在线样例]](https://chat.colossalai.org)
-
-
-
-
-
-- 最高可提升单机训练速度7.73倍,单卡推理速度1.42倍
-
-
-
-
-
-- 单卡模型容量最多提升10.3倍
-- 最小demo训练流程最低仅需1.62GB显存 (任意消费级GPU)
-
-
-
-
-
-- 提升单卡的微调模型容量3.7倍
-- 同时保持高速运行
-
-(back to top)
-
-### AIGC
-加速AIGC(AI内容生成)模型,如[Stable Diffusion v1](https://github.com/CompVis/stable-diffusion) 和 [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion)
-
-
-
-
-
-- [训练](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): 减少5.6倍显存消耗,硬件成本最高降低46倍(从A100到RTX3060)
-
-
-
-
-
-- [DreamBooth微调](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth): 仅需3-5张目标主题图像个性化微调
-
-
-
-
-
-- [推理](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): GPU推理显存消耗降低2.5倍
-
-
-(返回顶端)
-
-### 生物医药
-
-加速 [AlphaFold](https://alphafold.ebi.ac.uk/) 蛋白质结构预测
-
-
-
-
-
-- [FastFold](https://github.com/hpcaitech/FastFold): 加速AlphaFold训练与推理、数据前处理、推理序列长度超过10000残基
-
-
-
-
-
-- [FastFold with Intel](https://github.com/hpcaitech/FastFold): 3倍推理加速和39%成本节省
-
-
-
-
-
-- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): 11倍加速蛋白质单体与复合物结构预测
-
-(返回顶端)
-
## 安装
环境要求:
@@ -301,6 +300,8 @@ Colossal-AI 为您提供了一系列并行组件。我们的目标是让您的
- PyTorch >= 1.11 (PyTorch 2.x 正在适配中)
- Python >= 3.7
- CUDA >= 11.0
+- [NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0 (V100/RTX20 and higher)
+- Linux OS
如果你遇到安装问题,可以向本项目 [反馈](https://github.com/hpcaitech/ColossalAI/issues/new/choose)。
diff --git a/docs/source/en/get_started/installation.md b/docs/source/en/get_started/installation.md
index 672fd8ae03a4..290879219074 100644
--- a/docs/source/en/get_started/installation.md
+++ b/docs/source/en/get_started/installation.md
@@ -4,6 +4,8 @@ Requirements:
- PyTorch >= 1.11 (PyTorch 2.x in progress)
- Python >= 3.7
- CUDA >= 11.0
+- [NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0 (V100/RTX20 and higher)
+- Linux OS
If you encounter any problem about installation, you may want to raise an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose) in this repository.
diff --git a/docs/source/zh-Hans/get_started/installation.md b/docs/source/zh-Hans/get_started/installation.md
index 7a9b20255e77..72f85393814f 100755
--- a/docs/source/zh-Hans/get_started/installation.md
+++ b/docs/source/zh-Hans/get_started/installation.md
@@ -5,6 +5,8 @@
- PyTorch >= 1.11 (PyTorch 2.x 正在适配中)
- Python >= 3.7
- CUDA >= 11.0
+- [NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0 (V100/RTX20 and higher)
+- Linux OS
如果你遇到安装问题,可以向本项目 [反馈](https://github.com/hpcaitech/ColossalAI/issues/new/choose)。
diff --git a/examples/README.md b/examples/README.md
index 710ced101768..dd5e7b10ae66 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -12,6 +12,8 @@
This folder provides several examples accelerated by Colossal-AI. The `tutorial` folder is for everyone to quickly try out the different features in Colossal-AI. Other folders such as `images` and `language` include a wide range of deep learning tasks and applications.
+You can find applications such as Chatbot, Stable Diffusion and Biomedicine in the [Applications](https://github.com/hpcaitech/ColossalAI/tree/main/applications) directory.
+
## Folder Structure
```text