Skip to content
Merged

,m #22

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
7182ac2
[chat]add examples of training with limited resources in chat readme …
chengeharrison Apr 12, 2023
366a035
[checkpoint] Shard saved checkpoint need to be compatible with the n…
flybird11111 Apr 12, 2023
152239b
[gemini] gemini supports lazy init (#3379)
ver217 Apr 12, 2023
de84c03
Polish Code
NatalieC323 Apr 11, 2023
a3ac48e
[doc] Update README-zh-Hans.md (#3541)
digger-yu Apr 12, 2023
3f760da
Update README.md (#3548)
digger-yu Apr 13, 2023
77efdfe
[doc] Update README.md (#3549)
digger-yu Apr 13, 2023
535b896
[chat] polish tutorial doc (#3551)
binmakeswell Apr 13, 2023
1a809ed
[chat] ChatGPT train prompts on ray example (#3309)
MisterLin1995 Apr 13, 2023
f1b3d60
[example] reorganize for community examples (#3557)
binmakeswell Apr 14, 2023
1c7734b
[doc] Update 1D_tensor_parallel.md (#3563)
digger-yu Apr 14, 2023
4341f5e
[lazyinit] fix clone and deepcopy (#3553)
ver217 Apr 17, 2023
173dad0
[misc] add verbose arg for zero and op builder (#3552)
ver217 Apr 17, 2023
9edeadf
[doc] Update 1D_tensor_parallel.md (#3573)
digger-yu Apr 17, 2023
d329c29
Add docstr for zero3 chunk search utils (#3572)
yhna940 Apr 17, 2023
e355144
[chatgpt] Detached PPO Training (#3195)
CsRic Apr 17, 2023
cc1eec2
[chat] update reward model sh (#3578)
binmakeswell Apr 17, 2023
6b1a39b
[coati] add costom model suppor tguide (#3579)
Fazziekey Apr 17, 2023
6e7e43c
[doc] Update .github/workflows/README.md (#3577)
digger-yu Apr 17, 2023
7788e0b
fix: fix sft (#3568)
NicholasCao Apr 17, 2023
f313bab
[gemini] support save state dict in shards (#3581)
ver217 Apr 17, 2023
d0fbd4b
[example] fix community doc (#3586)
digger-yu Apr 18, 2023
36a519b
Update test_ci.sh
Camille7777 Mar 22, 2023
dac127d
[fx] fix meta tensor registration (#3589)
ver217 Apr 18, 2023
1ec0d38
reconstruct chat trainer and fix training script (#3588)
chengeharrison Apr 18, 2023
5a79cff
[coati] fix install cmd (#3592)
binmakeswell Apr 18, 2023
d96567b
[misc] op_builder/builder.py (#3593)
digger-yu Apr 18, 2023
d544ed4
[bot] Automated submodule synchronization (#3596)
github-actions[bot] Apr 19, 2023
12eff9e
[gemini] state dict supports fp16 (#3590)
ver217 Apr 19, 2023
7570d9a
[doc] fix op_builder/README.md (#3597)
digger-yu Apr 19, 2023
becd3b0
[doc] fix setup.py typo (#3603)
digger-yu Apr 19, 2023
633bac2
[doc] .github/workflows/README.md (#3605)
digger-yu Apr 20, 2023
c4709d3
Chat evaluate (#3608)
chengeharrison Apr 20, 2023
d7bf284
[chat] polish code note typo (#3612)
digger-yu Apr 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ In the section below, we will dive into the details of different workflows avail
Refer to this [documentation](https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow) on how to manually trigger a workflow.
I will provide the details of each workflow below.

**A PR which changes the `version.txt` is considered as a release PR in the following coontext.**
**A PR which changes the `version.txt` is considered as a release PR in the following context.**


### Code Style Check
Expand Down Expand Up @@ -58,23 +58,23 @@ I will provide the details of each workflow below.
#### Example Test on Dispatch

This workflow is triggered by manually dispatching the workflow. It has the following input parameters:
- `example_directory`: the example directory to test. Multiple directories are supported and must be separated b$$y comma. For example, language/gpt, images/vit. Simply input language or simply gpt does not work.
- `example_directory`: the example directory to test. Multiple directories are supported and must be separated by comma. For example, language/gpt, images/vit. Simply input language or simply gpt does not work.

### Compatibility Test

| Workflow Name | File name | Description |
| -------------------------------- | ------------------------------------ | -------------------------------------------------------------------------------------------------------------------- |
| `Compatibility Test on PR` | `compatibility_test_on_pr.yml` | Check Colossal-AI's compatiblity when `version.txt` is changed in a PR. |
| `Compatibility Test on Schedule` | `compatibility_test_on_schedule.yml` | This workflow will check the compatiblity of Colossal-AI against PyTorch specified in `.compatibility` every Sunday. |
| `Compatiblity Test on Dispatch` | `compatibility_test_on_dispatch.yml` | Test PyTorch Compatibility manually. |
| `Compatibility Test on PR` | `compatibility_test_on_pr.yml` | Check Colossal-AI's compatibility when `version.txt` is changed in a PR. |
| `Compatibility Test on Schedule` | `compatibility_test_on_schedule.yml` | This workflow will check the compatibility of Colossal-AI against PyTorch specified in `.compatibility` every Sunday. |
| `Compatibility Test on Dispatch` | `compatibility_test_on_dispatch.yml` | Test PyTorch Compatibility manually. |


#### Compatibility Test on Dispatch
This workflow is triggered by manually dispatching the workflow. It has the following input parameters:
- `torch version`:torch version to test against, multiple versions are supported but must be separated by comma. The default is value is all, which will test all available torch versions listed in this [repository](https://github.com/hpcaitech/public_assets/tree/main/colossalai/torch_build/torch_wheels).
- `cuda version`: cuda versions to test against, multiple versions are supported but must be separated by comma. The CUDA versions must be present in our [DockerHub repository](https://hub.docker.com/r/hpcaitech/cuda-conda).

> It only test the compatiblity of the main branch
> It only test the compatibility of the main branch


### Release
Expand Down Expand Up @@ -113,7 +113,7 @@ This `.compatibility` file is to tell GitHub Actions which PyTorch and CUDA vers

2. `.cuda_ext.json`

This file controls which CUDA versions will be checked against CUDA extenson built. You can add a new entry according to the json schema below to check the AOT build of PyTorch extensions before release.
This file controls which CUDA versions will be checked against CUDA extension built. You can add a new entry according to the json schema below to check the AOT build of PyTorch extensions before release.

```json
{
Expand Down Expand Up @@ -144,7 +144,7 @@ This file controls which CUDA versions will be checked against CUDA extenson bui
- [x] check on PR
- [x] regular check
- [x] manual dispatch
- [x] compatiblity check
- [x] compatibility check
- [x] check on PR
- [x] manual dispatch
- [x] auto test when release
Expand Down
27 changes: 18 additions & 9 deletions .github/workflows/run_chatgpt_examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ on:
pull_request:
types: [synchronize, opened, reopened]
paths:
- 'applications/ChatGPT/chatgpt/**'
- 'applications/ChatGPT/requirements.txt'
- 'applications/ChatGPT/setup.py'
- 'applications/ChatGPT/examples/**'
- 'applications/Chat/coati/**'
- 'applications/Chat/requirements.txt'
- 'applications/Chat/setup.py'
- 'applications/Chat/examples/**'


jobs:
Expand All @@ -16,7 +16,7 @@ jobs:
runs-on: [self-hosted, gpu]
container:
image: hpcaitech/pytorch-cuda:1.12.0-11.3.0
options: --gpus all --rm -v /data/scratch/chatgpt:/data/scratch/chatgpt
options: --gpus all --rm -v /data/scratch/github_actions/chat:/data/scratch/github_actions/chat
timeout-minutes: 30
defaults:
run:
Expand All @@ -27,17 +27,26 @@ jobs:

- name: Install ColossalAI and ChatGPT
run: |
pip install -v .
cd applications/ChatGPT
pip install -e .
cd applications/Chat
pip install -v .
pip install -r examples/requirements.txt

- name: Install Transformers
run: |
cd applications/Chat
git clone https://github.com/hpcaitech/transformers
cd transformers
pip install -v .

- name: Execute Examples
run: |
cd applications/ChatGPT
cd applications/Chat
rm -rf ~/.cache/colossalai
./examples/test_ci.sh
env:
NCCL_SHM_DISABLE: 1
MAX_JOBS: 8
PROMPT_PATH: /data/scratch/chatgpt/prompts.csv
SFT_DATASET: /data/scratch/github_actions/chat/data.json
PROMPT_PATH: /data/scratch/github_actions/chat/prompts_en.jsonl
PRETRAIN_DATASET: /data/scratch/github_actions/chat/alpaca_data.json
2 changes: 2 additions & 0 deletions applications/Chat/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,5 @@ docs/.build

# wandb log
example/wandb/

examples/awesome-chatgpt-prompts/
193 changes: 99 additions & 94 deletions applications/Chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,18 @@
- [Install the Transformers](#install-the-transformers)
- [How to use?](#how-to-use)
- [Supervised datasets collection](#supervised-datasets-collection)
- [Stage1 - Supervised instructs tuning](#stage1---supervised-instructs-tuning)
- [Stage2 - Training reward model](#stage2---training-reward-model)
- [Stage3 - Training model with reinforcement learning by human feedback](#stage3---training-model-with-reinforcement-learning-by-human-feedback)
- [Inference - After Training](#inference---after-training)
- [8-bit setup](#8-bit-setup)
- [4-bit setup](#4-bit-setup)
- [RLHF Training Stage1 - Supervised instructs tuning](#RLHF-training-stage1---supervised-instructs-tuning)
- [RLHF Training Stage2 - Training reward model](#RLHF-training-stage2---training-reward-model)
- [RLHF Training Stage3 - Training model with reinforcement learning by human feedback](#RLHF-training-stage3---training-model-with-reinforcement-learning-by-human-feedback)
- [Inference Quantization and Serving - After Training](#inference-quantization-and-serving---after-training)
- [Coati7B examples](#coati7b-examples)
- [Generation](#generation)
- [Open QA](#open-qa)
- [Limitation for LLaMA-finetuned models](#limitation-for-llama-finetuned-models)
- [Limitation of dataset](#limitation-of-dataset)
- [Limitation for LLaMA-finetuned models](#limitation)
- [Limitation of dataset](#limitation)
- [FAQ](#faq)
- [How to save/load checkpoint](#how-to-saveload-checkpoint)
- [How to save/load checkpoint](#faq)
- [How to train with limited resources](#faq)
- [The Plan](#the-plan)
- [Real-time progress](#real-time-progress)
- [Invitation to open-source contribution](#invitation-to-open-source-contribution)
Expand Down Expand Up @@ -82,6 +81,8 @@ Due to resource constraints, we will only provide this service from 29th Mar 202
```shell
conda create -n coati
conda activate coati
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI/applications/Chat
pip install .
```

Expand All @@ -106,107 +107,36 @@ Here is how we collected the data
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/>
</p>

### Stage1 - Supervised instructs tuning
### RLHF Training Stage1 - Supervised instructs tuning

Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model
Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model.

you can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.

```
torchrun --standalone --nproc_per_node=4 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2 \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 4 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
```

### Stage2 - Training reward model
### RLHF Training Stage2 - Training reward model

Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model

you can run the `examples/train_rm.sh` to start a reward model training
You can run the `examples/train_rm.sh` to start a reward model training.

```
torchrun --standalone --nproc_per_node=4 train_reward_model.py
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2 \
--loss_fn 'log_exp'\
--save_path 'rmstatic.pt' \
```

### Stage3 - Training model with reinforcement learning by human feedback
### RLHF Training Stage3 - Training model with reinforcement learning by human feedback

Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process:

<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/stage-3.jpeg" width=800/>
</p>

you can run the `examples/train_prompts.sh` to start training PPO with human feedback

```
torchrun --standalone --nproc_per_node=4 train_prompts.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2 \
--prompt_path /path/to/your/prompt_dataset \
--pretrain_dataset /path/to/your/pretrain_dataset \
--rm_pretrain /your/pretrain/rm/defination \
--rm_path /your/rm/model/path
```
You can run the `examples/train_prompts.sh` to start training PPO with human feedback.

For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples).

### Inference - After Training
#### 8-bit setup

8-bit quantization is originally supported by the latest [transformers](https://github.com/huggingface/transformers). Please install it from source.
### Inference Quantization and Serving - After Training

Please ensure you have downloaded HF-format model weights of LLaMA models.
We provide an online inference server and a benchmark. We aim to run inference on single GPU, so quantization is essential when using large models.

Usage:

```python
from transformers import LlamaForCausalLM
USE_8BIT = True # use 8-bit quantization; otherwise, use fp16
model = LlamaForCausalLM.from_pretrained(
"pretrained/path",
load_in_8bit=USE_8BIT,
torch_dtype=torch.float16,
device_map="auto",
)
if not USE_8BIT:
model.half() # use fp16
model.eval()
```

**Troubleshooting**: if you get errors indicating your CUDA-related libraries are not found when loading the 8-bit model, you can check whether your `LD_LIBRARY_PATH` is correct.

E.g. you can set `export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH`.

#### 4-bit setup

Please ensure you have downloaded the HF-format model weights of LLaMA models first.

Then you can follow [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). This lib provides efficient CUDA kernels and weight conversion scripts.

After installing this lib, we may convert the original HF-format LLaMA model weights to a 4-bit version.

```shell
CUDA_VISIBLE_DEVICES=0 python llama.py /path/to/pretrained/llama-7b c4 --wbits 4 --groupsize 128 --save llama7b-4bit.pt
```

Run this command in your cloned `GPTQ-for-LLaMa` directory, then you will get a 4-bit weight file `llama7b-4bit-128g.pt`.

**Troubleshooting**: if you get errors about `position_ids`, you can checkout to commit `50287c3b9ae4a3b66f6b5127c643ec39b769b155`(`GPTQ-for-LLaMa` repo).
We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference. You can
Online inference server scripts can help you deploy your own services.

For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference).

Expand Down Expand Up @@ -282,24 +212,27 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre

You can find more examples in this [repo](https://github.com/XueFuzhao/InstructionWild/blob/main/comparison.md).

### Limitation for LLaMA-finetuned models
### Limitation
<details><summary><b>Limitation for LLaMA-finetuned models</b></summary>
- Both Alpaca and ColossalChat are based on LLaMA. It is hard to compensate for the missing knowledge in the pre-training stage.
- Lack of counting ability: Cannot count the number of items in a list.
- Lack of Logics (reasoning and calculation)
- Tend to repeat the last sentence (fail to produce the end token).
- Poor multilingual results: LLaMA is mainly trained on English datasets (Generation performs better than QA).
</details>

### Limitation of dataset
<details><summary><b>Limitation of dataset</b></summary>
- Lack of summarization ability: No such instructions in finetune datasets.
- Lack of multi-turn chat: No such instructions in finetune datasets
- Lack of self-recognition: No such instructions in finetune datasets
- Lack of Safety:
- When the input contains fake facts, the model makes up false facts and explanations.
- Cannot abide by OpenAI's policy: When generating prompts from OpenAI API, it always abides by its policy. So no violation case is in the datasets.
</details>

## FAQ

### How to save/load checkpoint
<details><summary><b>How to save/load checkpoint</b></summary>

We have integrated the Transformers save and load pipeline, allowing users to freely call Hugging Face's language models and save them in the HF format.

Expand All @@ -324,6 +257,63 @@ trainer.fit()
trainer.save_model(path=args.save_path, only_rank0=True, tokenizer=tokenizer)
```

</details>

<details><summary><b>How to train with limited resources</b></summary>

Here are some examples that can allow you to train a 7B model on a single or multiple consumer-grade GPUs.

If you only have a single 24G GPU, you can use the following script. `batch_size` and `lora_rank` are the most important parameters to successfully train the model.
```
torchrun --standalone --nproc_per_node=1 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy naive \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 1 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
--lora_rank 16 \
```

`colossalai_gemini` strategy can enable a single 24G GPU to train the whole model without using LoRA if you have sufficient CPU memory. You can use the following script.
```
torchrun --standalone --nproc_per_node=1 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_gemini \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 1 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
```

If you have 4x32 GB GPUs, you can even train the whole 7B model using our `colossalai_zero2_cpu` strategy! The script is given as follows.
```
torchrun --standalone --nproc_per_node=4 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2_cpu \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 1 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
```
</details>


## The Plan

- [x] implement PPO fine-tuning
Expand Down Expand Up @@ -355,6 +345,14 @@ and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/m
Thanks so much to all of our amazing contributors!

## Quick Preview
<div align="center">
<a href="https://chat.colossalai.org/">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Chat-demo.png" width="700" />
</a>
</div>

- An open-source low cost solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. [[demo]](https://chat.colossalai.org)

<p id="ChatGPT_scaling" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/>
</p>
Expand All @@ -375,6 +373,13 @@ Thanks so much to all of our amazing contributors!
- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
- Keep in a sufficiently high running speed

| Model Pair | Alpaca-7B ⚔ Coati-7B | Coati-7B ⚔ Alpaca-7B |
| :-----------: | :------------------: | :------------------: |
| Better Cases | 38 ⚔ **41** | **45** ⚔ 33 |
| Win Rate | 48% ⚔ **52%** | **58%** ⚔ 42% |
| Average Score | 7.06 ⚔ **7.13** | **7.31** ⚔ 6.82 |
- Our Coati-7B model performs better than Alpaca-7B when using GPT-4 to evaluate model performance. The Coati-7B model we evaluate is an old version we trained a few weeks ago and the new version is around the corner.

## Authors

Coati is developed by ColossalAI Team:
Expand Down
Loading