Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 39 additions & 97 deletions applications/Chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,18 @@
- [Install the Transformers](#install-the-transformers)
- [How to use?](#how-to-use)
- [Supervised datasets collection](#supervised-datasets-collection)
- [Stage1 - Supervised instructs tuning](#stage1---supervised-instructs-tuning)
- [Stage2 - Training reward model](#stage2---training-reward-model)
- [Stage3 - Training model with reinforcement learning by human feedback](#stage3---training-model-with-reinforcement-learning-by-human-feedback)
- [Inference - After Training](#inference---after-training)
- [8-bit setup](#8-bit-setup)
- [4-bit setup](#4-bit-setup)
- [RLHF Training Stage1 - Supervised instructs tuning](#RLHF-training-stage1---supervised-instructs-tuning)
- [RLHF Training Stage2 - Training reward model](#RLHF-training-stage2---training-reward-model)
- [RLHF Training Stage3 - Training model with reinforcement learning by human feedback](#RLHF-training-stage3---training-model-with-reinforcement-learning-by-human-feedback)
- [Inference Quantization and Serving - After Training](#inference-quantization-and-serving---after-training)
- [Coati7B examples](#coati7b-examples)
- [Generation](#generation)
- [Open QA](#open-qa)
- [Limitation for LLaMA-finetuned models](#limitation-for-llama-finetuned-models)
- [Limitation of dataset](#limitation-of-dataset)
- [Limitation for LLaMA-finetuned models](#limitation)
- [Limitation of dataset](#limitation)
- [FAQ](#faq)
- [How to save/load checkpoint](#how-to-saveload-checkpoint)
- [How to train with limited resources](#how-to-train-with-limited-resources)
- [How to save/load checkpoint](#faq)
- [How to train with limited resources](#faq)
- [The Plan](#the-plan)
- [Real-time progress](#real-time-progress)
- [Invitation to open-source contribution](#invitation-to-open-source-contribution)
Expand Down Expand Up @@ -107,107 +105,36 @@ Here is how we collected the data
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/>
</p>

### Stage1 - Supervised instructs tuning
### RLHF Training Stage1 - Supervised instructs tuning

Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model
Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model.

you can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.

```
torchrun --standalone --nproc_per_node=4 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2 \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 4 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
```

### Stage2 - Training reward model
### RLHF Training Stage2 - Training reward model

Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model

you can run the `examples/train_rm.sh` to start a reward model training

```
torchrun --standalone --nproc_per_node=4 train_reward_model.py
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2 \
--loss_fn 'log_exp'\
--save_path 'rmstatic.pt' \
```
You can run the `examples/train_rm.sh` to start a reward model training.

### Stage3 - Training model with reinforcement learning by human feedback
### RLHF Training Stage3 - Training model with reinforcement learning by human feedback

Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process:

<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/stage-3.jpeg" width=800/>
</p>

you can run the `examples/train_prompts.sh` to start training PPO with human feedback

```
torchrun --standalone --nproc_per_node=4 train_prompts.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2 \
--prompt_path /path/to/your/prompt_dataset \
--pretrain_dataset /path/to/your/pretrain_dataset \
--rm_pretrain /your/pretrain/rm/defination \
--rm_path /your/rm/model/path
```
You can run the `examples/train_prompts.sh` to start training PPO with human feedback.

For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples).

### Inference - After Training
#### 8-bit setup

8-bit quantization is originally supported by the latest [transformers](https://github.com/huggingface/transformers). Please install it from source.

Please ensure you have downloaded HF-format model weights of LLaMA models.
### Inference Quantization and Serving - After Training

Usage:
We provide an online inference server and a benchmark. We aim to run inference on single GPU, so quantization is essential when using large models.

```python
from transformers import LlamaForCausalLM
USE_8BIT = True # use 8-bit quantization; otherwise, use fp16
model = LlamaForCausalLM.from_pretrained(
"pretrained/path",
load_in_8bit=USE_8BIT,
torch_dtype=torch.float16,
device_map="auto",
)
if not USE_8BIT:
model.half() # use fp16
model.eval()
```

**Troubleshooting**: if you get errors indicating your CUDA-related libraries are not found when loading the 8-bit model, you can check whether your `LD_LIBRARY_PATH` is correct.

E.g. you can set `export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH`.

#### 4-bit setup

Please ensure you have downloaded the HF-format model weights of LLaMA models first.

Then you can follow [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). This lib provides efficient CUDA kernels and weight conversion scripts.

After installing this lib, we may convert the original HF-format LLaMA model weights to a 4-bit version.

```shell
CUDA_VISIBLE_DEVICES=0 python llama.py /path/to/pretrained/llama-7b c4 --wbits 4 --groupsize 128 --save llama7b-4bit.pt
```

Run this command in your cloned `GPTQ-for-LLaMa` directory, then you will get a 4-bit weight file `llama7b-4bit-128g.pt`.

**Troubleshooting**: if you get errors about `position_ids`, you can checkout to commit `50287c3b9ae4a3b66f6b5127c643ec39b769b155`(`GPTQ-for-LLaMa` repo).
We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference. You can
Online inference server scripts can help you deploy your own services.

For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference).

Expand Down Expand Up @@ -283,24 +210,27 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre

You can find more examples in this [repo](https://github.com/XueFuzhao/InstructionWild/blob/main/comparison.md).

### Limitation for LLaMA-finetuned models
### Limitation
<details><summary><b>Limitation for LLaMA-finetuned models</b></summary>
- Both Alpaca and ColossalChat are based on LLaMA. It is hard to compensate for the missing knowledge in the pre-training stage.
- Lack of counting ability: Cannot count the number of items in a list.
- Lack of Logics (reasoning and calculation)
- Tend to repeat the last sentence (fail to produce the end token).
- Poor multilingual results: LLaMA is mainly trained on English datasets (Generation performs better than QA).
</details>

### Limitation of dataset
<details><summary><b>Limitation of dataset</b></summary>
- Lack of summarization ability: No such instructions in finetune datasets.
- Lack of multi-turn chat: No such instructions in finetune datasets
- Lack of self-recognition: No such instructions in finetune datasets
- Lack of Safety:
- When the input contains fake facts, the model makes up false facts and explanations.
- Cannot abide by OpenAI's policy: When generating prompts from OpenAI API, it always abides by its policy. So no violation case is in the datasets.
</details>

## FAQ

### How to save/load checkpoint
<details><summary><b>How to save/load checkpoint</b></summary>

We have integrated the Transformers save and load pipeline, allowing users to freely call Hugging Face's language models and save them in the HF format.

Expand All @@ -325,7 +255,9 @@ trainer.fit()
trainer.save_model(path=args.save_path, only_rank0=True, tokenizer=tokenizer)
```

### How to train with limited resources
</details>

<details><summary><b>How to train with limited resources</b></summary>

Here are some examples that can allow you to train a 7B model on a single or multiple consumer-grade GPUs.

Expand Down Expand Up @@ -360,7 +292,7 @@ torchrun --standalone --nproc_per_node=1 train_sft.py \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
```
```

If you have 4x32 GB GPUs, you can even train the whole 7B model using our `colossalai_zero2_cpu` strategy! The script is given as follows.
```
Expand All @@ -377,6 +309,8 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
--max_datasets_size 512 \
--max_epochs 1 \
```
</details>


## The Plan

Expand Down Expand Up @@ -409,6 +343,14 @@ and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/m
Thanks so much to all of our amazing contributors!

## Quick Preview
<div align="center">
<a href="https://chat.colossalai.org/">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Chat-demo.png" width="700" />
</a>
</div>

- An open-source low cost solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. [[demo]](https://chat.colossalai.org)

<p id="ChatGPT_scaling" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/>
</p>
Expand Down