Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
d37fb67
Merge pull request #118 from jamesthesnake/ra
jamesthesnake Aug 3, 2023
1038c2b
Merge pull request #119 from jamesthesnake/ra
jamesthesnake Aug 4, 2023
c299445
Merge pull request #120 from jamesthesnake/co
jamesthesnake Aug 4, 2023
d177105
Merge pull request #121 from jamesthesnake/best
jamesthesnake Aug 4, 2023
78d5362
Merge pull request #122 from hpcaitech/main
jamesthesnake Aug 5, 2023
16bcf46
Merge pull request #123 from jamesthesnake/better
jamesthesnake Aug 5, 2023
3ddd630
Merge pull request #126 from jamesthesnake/l
jamesthesnake Aug 5, 2023
a9e470b
Merge pull request #127 from jamesthesnake/co
jamesthesnake Aug 5, 2023
c6a7b89
Merge pull request #128 from jamesthesnake/better
jamesthesnake Aug 5, 2023
9fb0946
Merge pull request #129 from jamesthesnake/best
jamesthesnake Aug 5, 2023
8cce6f5
Merge pull request #131 from jamesthesnake/l
jamesthesnake Aug 10, 2023
b3945e1
Merge pull request #132 from jamesthesnake/co
jamesthesnake Aug 10, 2023
d06682e
Merge pull request #133 from jamesthesnake/better
jamesthesnake Aug 10, 2023
bebacae
Merge pull request #134 from jamesthesnake/ra
jamesthesnake Aug 10, 2023
b830b6a
Merge pull request #135 from jamesthesnake/best
jamesthesnake Aug 10, 2023
36f5835
Merge pull request #137 from hpcaitech/main
jamesthesnake Aug 17, 2023
ea707d2
Merge pull request #138 from jamesthesnake/better
jamesthesnake Aug 17, 2023
8df5ed9
Merge pull request #139 from jamesthesnake/co
jamesthesnake Aug 17, 2023
165d641
Merge pull request #140 from jamesthesnake/ra
jamesthesnake Aug 17, 2023
cb49f42
Merge pull request #141 from jamesthesnake/best
jamesthesnake Aug 17, 2023
5197fa4
Merge pull request #143 from jamesthesnake/l
jamesthesnake Aug 23, 2023
2706142
[gemini] improve compatibility and add static placement policy (#4479)
ver217 Aug 24, 2023
17e5edb
Merge pull request #144 from jamesthesnake/better
jamesthesnake Aug 24, 2023
b8d6a96
Merge pull request #145 from jamesthesnake/best
jamesthesnake Aug 24, 2023
3635068
Merge pull request #146 from jamesthesnake/ra
jamesthesnake Aug 24, 2023
152b0f4
Merge pull request #147 from jamesthesnake/better
jamesthesnake Aug 24, 2023
8d77dcf
Merge pull request #148 from jamesthesnake/best
jamesthesnake Aug 24, 2023
c0efc3e
[format] applied code formatting on changed files in pull request 447…
github-actions[bot] Aug 25, 2023
839847b
[zero]support zero2 with gradient accumulation (#4511)
Gy-Lu Aug 25, 2023
0b00def
[example] add llama2 example (#4527)
ver217 Aug 28, 2023
1467e3b
[coati] add chatglm model (#4539)
yingliu-hpc Aug 29, 2023
1c43bfd
[coati] update ci
ver217 Aug 30, 2023
661a1ef
Merge pull request #4541 from ver217/coati/chatglm
yingliu-hpc Aug 30, 2023
c648dc0
fix colossalai version in coati examples
yingliu-hpc Aug 30, 2023
9f852f2
keep requirements same with main branch
yingliu-hpc Aug 30, 2023
12c95a9
fix runtime prepare pass (#4502)
vincentccc Aug 30, 2023
8e2e199
[example] update streamlit 0.73.1 to 1.11.1 (#4386)
ChengDaqi2023 Aug 30, 2023
f1ae8c9
[example] change accelerate version (#4431)
tiansiyuan Aug 30, 2023
c7b60f7
[devops] cancel previous runs in the PR (#4546)
ver217 Aug 30, 2023
cbac782
[zero]fix zero ckptIO with offload (#4529)
Gy-Lu Sep 1, 2023
eb952ea
Update Dockerfile (#4499)
data-infra Sep 1, 2023
cfa6070
[Fix] Fix compile error (#4357)
HAOCHENYE Sep 1, 2023
7298842
Merge pull request #149 from hpcaitech/main
jamesthesnake Sep 3, 2023
8592807
Merge pull request #150 from jamesthesnake/jordan
jamesthesnake Sep 3, 2023
5a571c3
Merge pull request #151 from jamesthesnake/better
jamesthesnake Sep 3, 2023
131e54e
Merge pull request #152 from jamesthesnake/ra
jamesthesnake Sep 3, 2023
1197766
Merge pull request #153 from jamesthesnake/best
jamesthesnake Sep 3, 2023
f87802e
Merge pull request #154 from jamesthesnake/co
jamesthesnake Sep 3, 2023
63ecafb
[checkpointio] optimize zero optim checkpoint io (#4591)
ver217 Sep 4, 2023
7a978eb
[DOC] hotfix/llama2news (#4595)
binmakeswell Sep 4, 2023
8d7b022
[doc] add llama2 benchmark (#4604)
binmakeswell Sep 4, 2023
aaeb520
Merge pull request #4542 from hpcaitech/chatglm
yingliu-hpc Sep 4, 2023
30b1e1f
Merge pull request #155 from hpcaitech/main
jamesthesnake Sep 5, 2023
b259cf6
Merge pull request #156 from jamesthesnake/main
jamesthesnake Sep 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .github/workflows/build_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ jobs:
run:
shell: bash
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- name: Copy testmon cache
run: | # branch name may contain slash, we need to replace it with space
Expand All @@ -87,8 +87,8 @@ jobs:
anyLibraryFileChanged: ${{ steps.find-lib-change.outputs.any_changed }}
runs-on: ubuntu-latest
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v2
with:
Expand Down Expand Up @@ -147,8 +147,8 @@ jobs:
run:
shell: bash
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- name: Checkout TensorNVMe
uses: actions/checkout@v2
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/compatiblity_test_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ jobs:
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v3
- id: set-matrix
Expand Down Expand Up @@ -44,8 +44,8 @@ jobs:
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
timeout-minutes: 120
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- name: Install dependencies
run: |
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/doc_check_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ jobs:
github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
runs-on: ubuntu-latest
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v2

Expand All @@ -35,8 +35,8 @@ jobs:
github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
runs-on: ubuntu-latest
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v2
with:
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/doc_test_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ jobs:
any_changed: ${{ steps.changed-files.outputs.any_changed }}
changed_files: ${{ steps.changed-files.outputs.all_changed_files }}
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
name: Detect changed example files
steps:
- uses: actions/checkout@v3
Expand Down Expand Up @@ -63,8 +63,8 @@ jobs:
run:
shell: bash
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- name: Checkout ColossalAI-Documentation
uses: actions/checkout@v2
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/example_check_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ jobs:
anyChanged: ${{ steps.setup-matrix.outputs.anyChanged }}
name: Detect changed example files
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v3
with:
Expand Down Expand Up @@ -81,8 +81,8 @@ jobs:
options: --gpus all --rm -v /data/scratch/examples-data:/data/
timeout-minutes: 10
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v3

Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/run_chatgpt_examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,8 @@ jobs:
- name: Checkout ColossalAI
uses: actions/checkout@v2

- name: Install ColossalAI and ChatGPT
- name: Install ChatGPT
run: |
pip install -e .
cd applications/Chat
pip install -v .
pip install -r examples/requirements.txt
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/run_chatgpt_unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,8 @@ jobs:
- name: Checkout ColossalAI
uses: actions/checkout@v2

- name: Install ColossalAI and ChatGPT
- name: Install ChatGPT
run: |
pip install -e .
cd applications/Chat
pip install -v .
pip install -r requirements-test.txt
Expand Down
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
</div>

## Latest News
* [2023/09] [70 Billion Parameter LLaMA2 Model Training Accelerated by 195%](https://www.hpc-ai.tech/blog/70b-llama2-training)
* [2023/07] [HPC-AI Tech Raises 22 Million USD in Series A Funding](https://www.hpc-ai.tech/blog/hpc-ai-tech-raises-22-million-usd-in-series-a-funding-to-fuel-team-expansion-and-business-growth)
* [2023/07] [65B Model Pretraining Accelerated by 38%, Best Practices for Building LLaMA-Like Base Models Open-Source](https://www.hpc-ai.tech/blog/large-model-pretraining)
* [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
Expand All @@ -50,7 +51,7 @@
<li>
<a href="#Parallel-Training-Demo">Parallel Training Demo</a>
<ul>
<li><a href="#LLaMA">LLaMA</a></li>
<li><a href="#LLaMA2">LLaMA 1/2</a></li>
<li><a href="#GPT-3">GPT-3</a></li>
<li><a href="#GPT-2">GPT-2</a></li>
<li><a href="#BERT">BERT</a></li>
Expand Down Expand Up @@ -217,8 +218,16 @@ Acceleration of [AlphaFold Protein Structure](https://alphafold.ebi.ac.uk/)
<p align="right">(<a href="#top">back to top</a>)</p>

## Parallel Training Demo
### LLaMA2
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/llama2_pretraining.png" width=600/>
</p>

- 70 billion parameter LLaMA2 model training accelerated by 195%
[[code]](https://github.com/hpcaitech/ColossalAI/tree/example/llama/examples/language/llama)
[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)

### LLaMA
### LLaMA1
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/LLaMA_pretraining.png" width=600/>
</p>
Expand Down
75 changes: 63 additions & 12 deletions applications/Chat/coati/dataset/sft_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from torch.utils.data import Dataset
from tqdm import tqdm
from transformers import PreTrainedTokenizer

from coati.models.chatglm.chatglm_tokenizer import ChatGLMTokenizer
from colossalai.logging import get_dist_logger

from .utils import is_rank_0, jload
Expand Down Expand Up @@ -71,6 +71,42 @@ def _preprocess(sources: Sequence[str],
return sequences_token["input_ids"], labels, sequences_token["attention_mask"]


def _preprocess_chatglm(sources: Sequence[str],
targets: Sequence[str],
tokenizer: PreTrainedTokenizer,
max_length: int,
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
"""
Preprocess the data by tokenizing.
None for attention mask, ChatGLM will calculate attention mask according to input ids
"""

labels = []
input_ids = []
for source, target in zip(sources, targets):
source_id = tokenizer.encode(text=source, add_special_tokens=False)
target_id = tokenizer.encode(text=target, add_special_tokens=False)
input_id = tokenizer.build_inputs_with_special_tokens(source_id, target_id)
# truncate
sp_token_list = [tokenizer.gmask_token_id, tokenizer.bos_token_id]
truncate_length = max(0, len(input_id) - max_length)
input_id = input_id[truncate_length: ]
if truncate_length == len(source_id) + 1:
input_id = sp_token_list + input_id[1: ]
elif truncate_length > len(source_id) + 1:
input_id = sp_token_list + input_id[2: ]

context_length = input_id.index(tokenizer.bos_token_id)
mask_position = context_length - 1
label = [IGNORE_INDEX] * context_length + input_id[mask_position+1:]

pad_len = max_length - len(input_id)
input_id = input_id + [tokenizer.pad_token_id] * pad_len
input_ids.append(input_id)
labels.append(label + [IGNORE_INDEX] * pad_len)
return torch.tensor(input_ids), torch.tensor(labels), None


class SFTDataset(Dataset):
"""
Dataset for sft model
Expand All @@ -94,18 +130,25 @@ def __init__(self,
data["completion"] + tokenizer.eos_token
for data in tqdm(dataset, disable=not is_rank_0())
]

self.input_ids, self.labels, self.attention_mask = \
_preprocess(sources, targets, tokenizer, max_length)
if isinstance(tokenizer, ChatGLMTokenizer):
self.input_ids, self.labels, self.attention_mask = \
_preprocess_chatglm(sources, targets, tokenizer, max_length)
else:
self.input_ids, self.labels, self.attention_mask = \
_preprocess(sources, targets, tokenizer, max_length)

def __len__(self):
length = self.input_ids.shape[0]
return length

def __getitem__(self, idx):
return dict(input_ids=self.input_ids[idx],
labels=self.labels[idx],
attention_mask=self.attention_mask[idx])
if self.attention_mask is not None:
return dict(input_ids=self.input_ids[idx],
labels=self.labels[idx],
attention_mask=self.attention_mask[idx])
else:
return dict(input_ids=self.input_ids[idx],
labels=self.labels[idx])


class SupervisedDataset(Dataset):
Expand Down Expand Up @@ -137,14 +180,22 @@ def __init__(self,
]

logger.info("Tokenizing inputs... This may take some time...")
self.input_ids, self.labels, self.attention_mask = \
_preprocess(sources, targets, tokenizer, max_length)
if isinstance(tokenizer, ChatGLMTokenizer):
self.input_ids, self.labels, self.attention_mask = \
_preprocess_chatglm(sources, targets, tokenizer, max_length)
else:
self.input_ids, self.labels, self.attention_mask = \
_preprocess(sources, targets, tokenizer, max_length)

def __len__(self):
length = self.input_ids.shape[0]
return length

def __getitem__(self, idx):
return dict(input_ids=self.input_ids[idx],
labels=self.labels[idx],
attention_mask=self.attention_mask[idx])
if self.attention_mask is not None:
return dict(input_ids=self.input_ids[idx],
labels=self.labels[idx],
attention_mask=self.attention_mask[idx])
else:
return dict(input_ids=self.input_ids[idx],
labels=self.labels[idx])
3 changes: 3 additions & 0 deletions applications/Chat/coati/models/chatglm/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .chatglm_actor import ChatGLMActor

__all__ = ['ChatGLMActor']
34 changes: 34 additions & 0 deletions applications/Chat/coati/models/chatglm/chatglm_actor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from typing import Optional

import torch
from .configuration_chatglm import ChatGLMConfig
from .modeling_chatglm import ChatGLMForConditionalGeneration

from ..base import Actor


class ChatGLMActor(Actor):
"""
ChatGLM Actor model.

Args:
pretrained (str): Pretrained model name or path.
config (ChatGLMConfig): Model config.
checkpoint (bool): Enable gradient checkpointing.

do not support lora for now.
"""

def __init__(self,
pretrained: str = None,
config: Optional[ChatGLMConfig] = None,
checkpoint: bool = False) -> None:
if pretrained is not None:
model = ChatGLMForConditionalGeneration.from_pretrained(pretrained)
elif config is not None:
model = ChatGLMForConditionalGeneration(config)
else:
model = ChatGLMForConditionalGeneration(ChatGLMConfig())
if checkpoint:
model.gradient_checkpointing_enable()
super().__init__(model, lora_rank=0, lora_train_bias='none')
Loading