Skip to content
Merged

I #136

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
6ccecc0
[gemini] fix tensor storage cleaning in state dict collection (#4396)
Aug 10, 2023
d86ddd9
[hotfix] fix unsafe async comm in zero (#4404)
Gy-Lu Aug 11, 2023
6d41c3f
[doc] update Coati README (#4405)
cwher Aug 14, 2023
ff83679
[doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430)
tiansiyuan Aug 14, 2023
5e1a9d4
[cluster] add process group mesh (#4039)
ver217 Jun 20, 2023
4225442
[pipeline] add stage manager (#4093)
ver217 Jun 27, 2023
45fdc9b
[pipeline] implement p2p communication (#4100)
ver217 Jun 28, 2023
f51ce1b
[pipeline] refactor 1f1b schedule (#4115)
ver217 Jun 29, 2023
e8e7e49
[pipeline]add pipeline policy and bert forward (#4130)
CjhHa1 Jul 4, 2023
5c897dd
[pipeline] add stage manager (#4093)
ver217 Jun 27, 2023
c552cef
[pipeline]add pipeline policy and bert forward (#4130)
CjhHa1 Jul 4, 2023
90a65ea
[pipeline] build bloom model and policy , revise the base class of po…
CjhHa1 Jul 5, 2023
59f6f57
[pipeline] update shardformer policy
ver217 Jul 5, 2023
b0b8ad2
[pipeline] update shardformer docstring
ver217 Jul 5, 2023
2d6cc07
[test] update shardformer tests
ver217 Jul 5, 2023
5fc60a3
[test] add shard util tests
ver217 Jul 5, 2023
1ed3f8a
[shardformer] rename policy file name
ver217 Jul 5, 2023
d35bd7d
[shardformer] fix type hint
ver217 Jul 5, 2023
c5ea728
[pipeline] add bert_for_pretraining bert_lmhead forward and policy (#…
CjhHa1 Jul 6, 2023
f3bcc29
[pipeline] move bert related pipeline components to shardformer (#4187)
CjhHa1 Jul 7, 2023
890774b
[shardformer] support lazy init (#4202)
ver217 Jul 10, 2023
1094e0f
[pipeline] Bert pipeline for shardformer and its tests (#4197)
CjhHa1 Jul 10, 2023
1622031
[pipeline] Llama pipeline (#4205)
CjhHa1 Jul 11, 2023
31bcf86
[pipeline] Llama causal lm and llama for sequence classification pipe…
CjhHa1 Jul 11, 2023
37d22f6
[pipeline] add bloom model pipeline (#4210)
CjhHa1 Jul 13, 2023
208ac8f
[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224)
Jul 13, 2023
7e4de52
[shardformer] fix base policy (#4229)
ver217 Jul 14, 2023
a14d352
[pipeline] add pipeline forward for variants of gpt2 (#4238)
Jul 17, 2023
e7cc62d
[pipeline] All bert models (#4233)
CjhHa1 Jul 17, 2023
34f0e34
[pipeline] finish bloom models pipeline and tests (#4223)
CjhHa1 Jul 17, 2023
d9be047
[bugs] hot fix some testing bugs for new models (#4268)
CjhHa1 Jul 18, 2023
2a2eacf
[pipeline] support shardformer for GPT2ForQuestionAnswering & complet…
Jul 19, 2023
d921ce8
[shardformer] support inplace sharding (#4251)
ver217 Jul 20, 2023
b774d5e
[pipeline] refactor gpt2 pipeline forwards (#4287)
Jul 20, 2023
d8408d1
[pipeline] OPT model pipeline (#4258)
CjhHa1 Jul 20, 2023
0a8f3c8
[hotfix] fix opt pipeline (#4293)
CjhHa1 Jul 20, 2023
18ebcf4
[pipeline] reformat for unified design (#4283)
CjhHa1 Jul 21, 2023
36e546b
[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300)
Jul 21, 2023
d080712
[pipeline] test pure pipeline process using llama (#4218)
CjhHa1 Jul 25, 2023
083d7da
[pipeline] add pipeline support for all T5 models (#4310)
Jul 25, 2023
b3f5d7a
[shardformer] support pipeline base vit model (#4284)
FoolPlayer Jul 25, 2023
261eab0
[plugin] add 3d parallel plugin (#4295)
ver217 Jul 25, 2023
411cf1d
[hotfix] fix gemini and zero test (#4333)
ver217 Jul 27, 2023
da3cef2
[pipeline] fix return_dict/fix pure_pipeline_test (#4331)
Jul 27, 2023
d3c6cd6
[pipeline] add unit test for 1f1b (#4303)
Gy-Lu Jul 31, 2023
f13954c
[pipeline] refactor test pipeline and remove useless utils in pipelin…
CjhHa1 Aug 1, 2023
0ceec8f
[pipeline] support fp32 for HybridPlugin/merge shardformer test and p…
Aug 1, 2023
c59d7ac
Feature/vit support (#4182)
klhhhhh Jul 7, 2023
dd2bf02
[shardformer] support SAM (#4231)
FoolPlayer Jul 14, 2023
9ee4ebe
[shardformer] support whisper (#4212)
FoolPlayer Jul 17, 2023
ed34bb1
Feature/chatglm (#4240)
klhhhhh Jul 20, 2023
f60162b
[shardformer] added tests
klhhhhh Jul 4, 2023
c492869
[shardformer] vit test finish and support
klhhhhh Jul 6, 2023
7377be7
import chatglm
klhhhhh Jul 7, 2023
6ee4c9e
[shardformer] add test kit in model zoo for chatglm
klhhhhh Jul 7, 2023
8620009
[sharformer] add first version of policy of chatglm
klhhhhh Jul 10, 2023
1a29e8f
[shardformer] polish chatglm code
klhhhhh Jul 12, 2023
cbb54d3
[shardformer] polish code
klhhhhh Jul 13, 2023
dad00c4
[shardformer] support chatglm without layernorm
klhhhhh Jul 14, 2023
00f6ef1
[shardformer] delete some file
klhhhhh Jul 17, 2023
f155ae8
[shardformer] ChatGLM support layernorm sharding
klhhhhh Jul 17, 2023
91850fe
[shardformer] register without auto policy
klhhhhh Jul 18, 2023
4da0505
[shardformer] pre-commit check files
klhhhhh Jul 19, 2023
8120eca
[shardformer] support ChatGLMForConditionalGeneration & add fusedlaye…
klhhhhh Jul 20, 2023
879301d
[shardformer] support Blip2 (#4243)
FoolPlayer Jul 25, 2023
726541a
update some module with new api version
FoolPlayer Aug 1, 2023
c3ca53c
[test] skip some not compatible models
FoolPlayer Aug 2, 2023
5c6f183
[test] Hotfix/fix some model test and refactor check util api (#4369)
FoolPlayer Aug 3, 2023
b1feece
[shardformer] add util functions for shardformer tests/fix sync_share…
Aug 3, 2023
a88e922
[pipeline] add chatglm (#4363)
CjhHa1 Aug 4, 2023
906426c
[Shardformer] Merge flash attention branch to pipeline branch (#4362)
flybird11111 Aug 7, 2023
ed4c448
[pipeline] rewrite t5 tests & support multi-tensor transmitting in pi…
Aug 8, 2023
7a3dfd0
[shardformer] update shardformer to use flash attention 2 (#4392)
flybird11111 Aug 9, 2023
d2cd48e
[shardformer] test all optimizations (#4399)
flybird11111 Aug 10, 2023
7596e9a
[pipeline] rewrite bert tests and fix some bugs (#4409)
CjhHa1 Aug 11, 2023
21e0a42
[shardformer]fix, test gpt2 for AMP+TP (#4403)
flybird11111 Aug 11, 2023
7711bd5
[shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395)
Aug 11, 2023
1edc9b5
[shardformer] update tests for all optimization (#4413)
flybird11111 Aug 11, 2023
108e54a
[shardformer]update t5 tests for using all optimizations. (#4407)
flybird11111 Aug 14, 2023
328a791
[shardformer] update bloom/llama/vit/chatglm tests (#4420)
flybird11111 Aug 14, 2023
172f7fa
[misc] resolve code factor issues (#4433)
ver217 Aug 14, 2023
9223022
[misc] update requirements
ver217 Aug 15, 2023
73a4144
[shardformer] fix embedding
ver217 Aug 15, 2023
5d4efdf
[shardformer] fix import
ver217 Aug 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 94 additions & 32 deletions applications/Chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
<span>ColossalChat</span>
</h1>


## Table of Contents

- [Table of Contents](#table-of-contents)
Expand Down Expand Up @@ -34,14 +33,17 @@
- [Authors](#authors)
- [Citations](#citations)
- [Licenses](#licenses)

---

## What is ColossalChat and Coati ?

[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) is the project to implement LLM with RLHF, powered by the [Colossal-AI](https://github.com/hpcaitech/ColossalAI) project.

Coati stands for `ColossalAI Talking Intelligence`. It is the name for the module implemented in this project and is also the name of the large language model developed by the ColossalChat project.

The Coati package provides a unified large language model framework that has implemented the following functions

- Supports comprehensive large-model training acceleration capabilities for ColossalAI, without requiring knowledge of complex distributed training algorithms
- Supervised datasets collection
- Supervised instructions fine-tuning
Expand All @@ -56,17 +58,19 @@ The Coati package provides a unified large language model framework that has imp
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/chatgpt.png" width=700/>
</p>

Image source: https://openai.com/blog/chatgpt
Image source: https://openai.com/blog/chatgpt

</div>

**As Colossal-AI is undergoing some major updates, this project will be actively maintained to stay in line with the Colossal-AI project.**


More details can be found in the latest news.
* [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
* [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt)

- [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
- [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt)

## Online demo

<div align="center">
<a href="https://www.youtube.com/watch?v=HcTiHzApHm0">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20YouTube.png" width="700" />
Expand All @@ -83,13 +87,13 @@ More details can be found in the latest news.
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20Speed.jpg" width=450/>
</p>

> DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance can be reproduced on an AWS p4d.24xlarge node with 8 A100-40G GPUs with the following command: torchrun --standalone --nproc_per_node 8 benchmark_opt_lora_dummy.py --num_collect_steps 1 --use_kernels --strategy colossalai_zero2 --experience_batch_size 64 --train_batch_size 32
> DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance can be reproduced on an AWS p4d.24xlarge node with 8 A100-40G GPUs with the following command: `torchrun --standalone --nproc_per_node 8 benchmark_opt_lora_dummy.py --num_collect_steps 1 --use_kernels --strategy colossalai_zero2 --experience_batch_size 64 --train_batch_size 32`

## Install

### Install the environment

```shell
```bash
conda create -n coati
conda activate coati
git clone https://github.com/hpcaitech/ColossalAI.git
Expand All @@ -99,18 +103,19 @@ pip install .

### Install the Transformers

```shell
```bash
pip install transformers==4.30.2
```

## How to use?

### Supervised datasets collection

we collected 104K bilingual datasets of Chinese and English, and you can find the datasets in this repo
[InstructionWild](https://github.com/XueFuzhao/InstructionWild)
We collected 104K bilingual datasets of Chinese and English, and you can find the datasets in this repo
[InstructionWild](https://github.com/XueFuzhao/InstructionWild) and in this [file](https://github.com/XueFuzhao/InstructionWild/blob/main/data/README.md).

Here is how we collected the data

<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/>
</p>
Expand All @@ -122,6 +127,20 @@ Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned ea
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.
[[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg)

**Note**: the supervised dataset follows the following format,

```json
[
{
"instruction": "Provide a list of the top 10 most popular mobile games in Asia",
"input": "",
"output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved",
"id": 0
},
...
]
```

### RLHF Training Stage2 - Training reward model

Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model
Expand All @@ -140,13 +159,46 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of
You can run the `examples/train_prompts.sh` to start training PPO with human feedback.
[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g)

**Note**: the required datasets follow the following format,

- `pretrain dataset`

```json
[
{
"instruction": "Provide a list of the top 10 most popular mobile games in Asia",
"input": "",
"output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved",
"id": 0
},
...
]
```

- `prompt dataset`

```json
[
{
"instruction": "Edit this paragraph to make it more concise: \"Yesterday, I went to the store and bought some things. Then, I came home and put them away. After that, I went for a walk and met some friends.\"",
"id": 0
},
{
"instruction": "Write a descriptive paragraph about a memorable vacation you went on",
"id": 1
},
...
]
```

For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples).

### Inference Quantization and Serving - After Training

We provide an online inference server and a benchmark. We aim to run inference on single GPU, so quantization is essential when using large models.

We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference. You can
We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference.

Online inference server scripts can help you deploy your own services.

For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference).
Expand All @@ -158,6 +210,7 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre
<details><summary><b>E-mail</b></summary>

![phd](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/Phd.png)

</details>

<details><summary><b>coding</b></summary>
Expand Down Expand Up @@ -191,6 +244,7 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre
</details>

### Open QA

<details><summary><b>Game</b></summary>

![Game](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/game.png)
Expand Down Expand Up @@ -224,6 +278,7 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre
You can find more examples in this [repo](https://github.com/XueFuzhao/InstructionWild/blob/main/comparison.md).

### Limitation

<details><summary><b>Limitation for LLaMA-finetuned models</b></summary>
- Both Alpaca and ColossalChat are based on LLaMA. It is hard to compensate for the missing knowledge in the pre-training stage.
- Lack of counting ability: Cannot count the number of items in a list.
Expand All @@ -247,7 +302,7 @@ You can find more examples in this [repo](https://github.com/XueFuzhao/Instructi

We have integrated the Transformers save and load pipeline, allowing users to freely call Hugging Face's language models and save them in the HF format.

```
```python
from coati.models.llama import LlamaLM
from coati.trainer import SFTTrainer

Expand All @@ -256,20 +311,20 @@ tokenizer = AutoTokenizer.from_pretrained(args.pretrain)

(model, optim) = strategy.prepare((model, optim))
trainer = SFTTrainer(model=model,
strategy=strategy,
optim=optim,
train_dataloader=train_dataloader,
eval_dataloader=eval_dataloader,
batch_size=args.batch_size,
max_epochs=args.max_epochs,
accumulation_steps = args.accumulation_steps
)
strategy=strategy,
optim=optim,
train_dataloader=train_dataloader,
eval_dataloader=eval_dataloader,
batch_size=args.batch_size,
max_epochs=args.max_epochs,
accumulation_steps=args.accumulation_steps
)

trainer.fit()
# this saves in pytorch format
strategy.save_model(model, args.save_path, only_rank0=True)

# this saves in HF format. ColossalAI strategy with stage-3 doesn't support this method
# this saves in HF format
strategy.save_pretrained(model, args.save_path, only_rank0=True, tokenizer=tokenizer)
```

Expand All @@ -280,12 +335,13 @@ strategy.save_pretrained(model, args.save_path, only_rank0=True, tokenizer=token
Here are some examples that can allow you to train a 7B model on a single or multiple consumer-grade GPUs.

If you only have a single 24G GPU, you can use the following script. `batch_size`, `lora_rank` and `grad_checkpoint` are the most important parameters to successfully train the model.
```

```bash
// [INFO]: MAX GPU MEMORY ALLOCATED: 19148.9345703125 MB
torchrun --standalone --nproc_per_node=1 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy ddp \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 1 \
Expand All @@ -298,12 +354,12 @@ torchrun --standalone --nproc_per_node=1 train_sft.py \
```

`colossalai_gemini` strategy can enable a single 24G GPU to train the whole model without using LoRA if you have sufficient CPU memory. You can use the following script.
```

```bash
torchrun --standalone --nproc_per_node=1 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_gemini \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 1 \
Expand All @@ -315,12 +371,12 @@ torchrun --standalone --nproc_per_node=1 train_sft.py \
```

If you have 4x32 GB GPUs, you can even train the whole 7B model using our `colossalai_zero2_cpu` strategy! The script is given as follows.
```

```bash
torchrun --standalone --nproc_per_node=4 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2_cpu \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 1 \
Expand All @@ -330,8 +386,8 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
--max_epochs 1 \
--grad_checkpoint
```
</details>

</details>

## The Plan

Expand All @@ -346,24 +402,26 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
- [ ] support chain-of-thought by [langchain](https://github.com/hwchase17/langchain)

### Real-time progress
You will find our progress in github project broad

[Coati](https://github.com/orgs/hpcaitech/projects/17/views/1)
You will find our progress in github [project broad](https://github.com/orgs/hpcaitech/projects/17/views/1).

## Invitation to open-source contribution

Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models from the starting point of replicating ChatGPT!

You may contact us or participate in the following ways:

1. [Leaving a Star ⭐](https://github.com/hpcaitech/ColossalAI/stargazers) to show your like and support. Thanks!
2. Posting an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose), or submitting a PR on GitHub follow the guideline in [Contributing](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md).
3. Join the Colossal-AI community on
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your ideas.
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your ideas.
4. Send your official proposal to email contact@hpcaitech.com

Thanks so much to all of our amazing contributors!

## Quick Preview

<div align="center">
<a href="https://chat.colossalai.org/">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Chat-demo.png" width="700" />
Expand Down Expand Up @@ -397,18 +455,22 @@ Thanks so much to all of our amazing contributors!
| Better Cases | 38 ⚔ **41** | **45** ⚔ 33 |
| Win Rate | 48% ⚔ **52%** | **58%** ⚔ 42% |
| Average Score | 7.06 ⚔ **7.13** | **7.31** ⚔ 6.82 |

- Our Coati-7B model performs better than Alpaca-7B when using GPT-4 to evaluate model performance. The Coati-7B model we evaluate is an old version we trained a few weeks ago and the new version is around the corner.

## Authors

Coati is developed by ColossalAI Team:

- [Fazzie](https://fazzie-key.cool/about/index.html)
- [FrankLeeeee](https://github.com/FrankLeeeee)
- [BlueRum](https://github.com/ht-zhou)
- [ver217](https://github.com/ver217)
- [ofey404](https://github.com/ofey404)
- [Wenhao Chen](https://github.com/CWHer)

The Phd student from [(HPC-AI) Lab](https://ai.comp.nus.edu.sg/) also contributed a lot to this project.

- [Zangwei Zheng](https://github.com/zhengzangw)
- [Xue Fuzhao](https://github.com/XueFuzhao)

Expand Down
9 changes: 6 additions & 3 deletions applications/Chat/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,12 @@ We also provide various training strategies:

We only support `torchrun` to launch now. E.g.

```shell
```bash
# run OPT-125M with no lora (lora_rank=0) on single-node single-GPU with min batch size
torchrun --standalone --nproc_per_node 1 benchmark_opt_lora_dummy.py --model 125m --critic_model 125m --strategy ddp --experience_batch_size 1 --train_batch_size 1 --lora_rank 0
torchrun --standalone --nproc_per_node 1 benchmark_opt_lora_dummy.py \
--model 125m --critic_model 125m --strategy ddp \
--experience_batch_size 1 --train_batch_size 1 --lora_rank 0
# run Actor (OPT-1.3B) and Critic (OPT-350M) with lora_rank=4 on single-node 4-GPU
torchrun --standalone --nproc_per_node 4 benchmark_opt_lora_dummy.py --model 1.3b --critic_model 350m --strategy colossalai_zero2 --lora_rank 4
torchrun --standalone --nproc_per_node 4 benchmark_opt_lora_dummy.py \
--model 1.3b --critic_model 350m --strategy colossalai_zero2 --lora_rank 4
```
Loading