Skip to content
Merged

Co #81

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
7e46bc8
fix CheckpointIndexFile is not defined (#4109)
digger-yu Jul 3, 2023
8abc877
fix Tensor is not defined (#4129)
digger-yu Jul 3, 2023
1350ece
[hotfix] fix import bug in checkpoint_io (#4142)
Jul 3, 2023
3d8d5d0
[chat] use official transformers and fix some issues (#4117)
cwher Jul 4, 2023
8d68de7
[shardformer] init shardformer code structure (#3731)
FoolPlayer May 22, 2023
8cc1123
[shardformer]: Feature/shardformer, add some docstring and readme (#3…
FoolPlayer May 24, 2023
235792f
[shardformer] updated readme (#3827)
FrankLeeeee May 24, 2023
4972e1f
[shardformer] refactored the user api (#3828)
FrankLeeeee May 24, 2023
c594dc2
[shardformer] update readme with modules implement doc (#3834)
FoolPlayer May 24, 2023
ab8a47f
[shardformer] add Dropout layer support different dropout pattern (#3…
FoolPlayer Jun 1, 2023
70173e3
update README (#3909)
FoolPlayer Jun 6, 2023
79f8d5d
[shardformer] add gpt2 policy and modify shard and slicer to support …
FoolPlayer Jun 7, 2023
f1cb5ac
[shardformer] Align bert value (#3907)
FoolPlayer Jun 9, 2023
a731304
[shardformer] Unit test (#3928)
FoolPlayer Jun 12, 2023
45927d5
[shardformer] Add dropout layer in shard model and refactor policy ap…
FoolPlayer Jun 12, 2023
6b30dfb
[shardformer] support llama model using shardformer (#3969)
wukong1992 Jun 13, 2023
c1c672d
[shardformer] shardformer support t5 model (#3994)
wukong1992 Jun 15, 2023
f7774ec
[Shardformer] Downstream bert (#3979)
FoolPlayer Jun 15, 2023
a2f9af8
[shardformer] fix an error in readme (#3988)
FoolPlayer Jun 15, 2023
6119712
[device] support init device mesh from process group (#3990)
FrankLeeeee Jun 15, 2023
d3bc530
[shardformer] Refactor shardformer api (#4001)
FoolPlayer Jun 15, 2023
015af59
[shardformer] integrated linear 1D with dtensor (#3996)
FrankLeeeee Jun 15, 2023
dfca967
integrate with dist layer (#4011)
FoolPlayer Jun 16, 2023
3893fa1
[shardformer] refactored embedding and dropout to parallel module (#4…
FrankLeeeee Jun 16, 2023
45d9384
[shardformer] removed inplace tensor sharding (#4018)
FrankLeeeee Jun 16, 2023
507c0ad
add vocabembedding layer
FoolPlayer Jun 16, 2023
df018fc
support bert with new api
FoolPlayer Jun 16, 2023
e253a07
[shardformer] updated doc (#4016)
FrankLeeeee Jun 16, 2023
74d176c
[shardformer] fix bert and gpt downstream with new api (#4024)
FoolPlayer Jun 19, 2023
c1d5453
[shardformer] adapted llama to the new API (#4036)
FrankLeeeee Jun 19, 2023
d857f3d
[shardformer] supported T5 and its variants (#4045)
FrankLeeeee Jun 19, 2023
4021b9a
[shardformer] add gpt2 test and layer class refactor (#4041)
FoolPlayer Jun 20, 2023
58df720
[shardformer] adapted T5 and LLaMa test to use kit (#4049)
FrankLeeeee Jun 21, 2023
f22ddac
[shardformer] refactored the shardformer layer structure (#4053)
FrankLeeeee Jun 21, 2023
7740c55
support kit use for bert/gpt test (#4055)
FoolPlayer Jun 22, 2023
8eb09a4
[shardformer] support module saving and loading (#4062)
FrankLeeeee Jun 22, 2023
0803a61
[shardformer] add linearconv1d test (#4067)
FoolPlayer Jun 22, 2023
70c58cf
[shardformer] supported fused qkv checkpoint (#4073)
FrankLeeeee Jun 23, 2023
92f6791
[shardformer] Add layernorm (#4072)
FoolPlayer Jun 23, 2023
c4b1b65
[test] fixed tests failed due to dtensor change (#4082)
FrankLeeeee Jun 26, 2023
d33a44e
[shardformer] refactored layernorm (#4086)
FrankLeeeee Jun 26, 2023
ac80937
[shardformer] shardformer support opt models (#4091)
flybird11111 Jun 27, 2023
8af29ee
[shardformer] support vision transformer (#4096)
klhhhhh Jun 28, 2023
b1c2901
[shardformer] supported bloom model (#4098)
FrankLeeeee Jun 28, 2023
f3b6aaa
[shardformer] supported fused normalization (#4112)
FrankLeeeee Jun 30, 2023
6a88bae
[shardformer] integrate with data parallelism (#4103)
FrankLeeeee Jun 30, 2023
44a190e
[shardformer] import huggingface implicitly (#4101)
FrankLeeeee Jun 30, 2023
ae035d3
[shardformer] added embedding gradient check (#4124)
FrankLeeeee Jun 30, 2023
7f9b303
[shardformer] write an shardformer example with bert finetuning (#4126)
flybird11111 Jun 30, 2023
74257cb
[shardformer] refactored some doc and api (#4137)
FrankLeeeee Jul 3, 2023
1fb0d95
[shardformer] made tensor parallelism configurable (#4144)
FrankLeeeee Jul 4, 2023
89f45ed
[shardformer] added development protocol for standardization (#4149)
FrankLeeeee Jul 4, 2023
f447ca1
[chat] removed cache file (#4155)
FrankLeeeee Jul 4, 2023
c77b3b1
[format] applied code formatting on changed files in pull request 415…
github-actions[bot] Jul 4, 2023
2ac2404
fix some typo colossalai/shardformer (#4160)
digger-yu Jul 4, 2023
1908caa
[cli] hotfix launch command for multi-nodes (#4165)
ver217 Jul 4, 2023
cc3cbe9
[workflow] show test duration (#4159)
FrankLeeeee Jul 4, 2023
190a6ea
[dtensor] fixed readme file name and removed deprecated file (#4162)
FrankLeeeee Jul 4, 2023
6748e3d
Merge pull request #80 from hpcaitech/main
jamesthesnake Jul 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ jobs:

- name: Execute Unit Testing
run: |
CURL_CA_BUNDLE="" PYTHONPATH=$PWD pytest --testmon --testmon-cov=. tests/
CURL_CA_BUNDLE="" PYTHONPATH=$PWD pytest --testmon --testmon-cov=. --durations=10 tests/
env:
DATA: /data/scratch/cifar-10
NCCL_SHM_DISABLE: 1
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/build_on_schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: Build on Schedule
on:
schedule:
# run at 00:00 of every Sunday
- cron: '0 0 * * *'
- cron: "0 0 * * *"
workflow_dispatch:

jobs:
Expand Down Expand Up @@ -60,7 +60,7 @@ jobs:
- name: Unit Testing
if: steps.check-avai.outputs.avai == 'true'
run: |
PYTHONPATH=$PWD pytest tests
PYTHONPATH=$PWD pytest --durations=0 tests
env:
DATA: /data/scratch/cifar-10
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
Expand Down
14 changes: 5 additions & 9 deletions .github/workflows/run_chatgpt_examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@ on:
pull_request:
types: [synchronize, opened, reopened]
paths:
- 'applications/Chat/coati/**'
- 'applications/Chat/requirements.txt'
- 'applications/Chat/setup.py'
- 'applications/Chat/examples/**'

- "applications/Chat/coati/**"
- "applications/Chat/requirements.txt"
- "applications/Chat/setup.py"
- "applications/Chat/examples/**"

jobs:
tests:
Expand Down Expand Up @@ -38,10 +37,7 @@ jobs:

- name: Install Transformers
run: |
cd applications/Chat
git clone https://github.com/hpcaitech/transformers
cd transformers
pip install -v .
pip install transformers==4.30.2

- name: Execute Examples
run: |
Expand Down
5 changes: 1 addition & 4 deletions applications/Chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,12 +98,9 @@ pip install .
```

### Install the Transformers
Given Hugging Face hasn't officially supported the LLaMA models, We fork a branch of Transformers that can be compatible with our code

```shell
git clone https://github.com/hpcaitech/transformers
cd transformers
pip install .
pip install transformers==4.30.2
```

## How to use?
Expand Down
4 changes: 0 additions & 4 deletions applications/Chat/coati/models/deberta/__init__.py

This file was deleted.

36 changes: 0 additions & 36 deletions applications/Chat/coati/models/deberta/deberta_critic.py

This file was deleted.

37 changes: 0 additions & 37 deletions applications/Chat/coati/models/deberta/deberta_rm.py

This file was deleted.

5 changes: 0 additions & 5 deletions applications/Chat/coati/models/roberta/__init__.py

This file was deleted.

35 changes: 0 additions & 35 deletions applications/Chat/coati/models/roberta/roberta_actor.py

This file was deleted.

38 changes: 0 additions & 38 deletions applications/Chat/coati/models/roberta/roberta_critic.py

This file was deleted.

39 changes: 0 additions & 39 deletions applications/Chat/coati/models/roberta/roberta_rm.py

This file was deleted.

12 changes: 1 addition & 11 deletions applications/Chat/coati/ray/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,8 @@
from coati.models.gpt import GPTRM, GPTActor, GPTCritic
from coati.models.llama import LlamaActor, LlamaCritic, LlamaRM
from coati.models.opt import OPTRM, OPTActor, OPTCritic
from coati.models.roberta import RoBERTaActor, RoBERTaCritic, RoBERTaRM
from coati.trainer.strategies import DDPStrategy, GeminiStrategy, LowLevelZeroStrategy
from coati.utils import prepare_llama_tokenizer_and_embedding
from transformers import AutoTokenizer, BloomTokenizerFast, GPT2Tokenizer, LlamaTokenizer, RobertaTokenizer
from transformers import AutoTokenizer, BloomTokenizerFast, GPT2Tokenizer, LlamaTokenizer


def is_rank_0() -> bool:
Expand All @@ -36,8 +34,6 @@ def get_actor_from_args(model: str, pretrained: str = None, config=None, lora_ra
actor = OPTActor(pretrained=pretrained, config=config, lora_rank=lora_rank)
elif model == 'llama':
actor = LlamaActor(pretrained=pretrained, config=config, lora_rank=lora_rank)
elif model == 'roberta':
actor = RoBERTaActor(pretrained=pretrained, config=config, lora_rank=lora_rank)
else:
raise ValueError(f'Unsupported actor model "{model}"')
return actor
Expand All @@ -52,8 +48,6 @@ def get_critic_from_args(model: str, pretrained: str = None, config=None, lora_r
critic = OPTCritic(pretrained=pretrained, lora_rank=lora_rank, config=config, use_action_mask=True)
elif model == 'llama':
critic = LlamaCritic(pretrained=pretrained, lora_rank=lora_rank, config=config, use_action_mask=True)
elif model == 'roberta':
critic = RoBERTaCritic(pretrained=pretrained, lora_rank=lora_rank, config=config, use_action_mask=True)
else:
raise ValueError(f'Unsupported reward model "{model}"')
return critic
Expand All @@ -68,8 +62,6 @@ def get_reward_model_from_args(model: str, pretrained: str = None, config=None):
reward_model = OPTRM(pretrained=pretrained, config=config)
elif model == 'llama':
reward_model = LlamaRM(pretrained=pretrained, config=config)
elif model == 'roberta':
reward_model = RoBERTaRM(pretrained=pretrained, config=config)
else:
raise ValueError(f'Unsupported reward model "{model}"')
return reward_model
Expand Down Expand Up @@ -101,8 +93,6 @@ def get_tokenizer_from_args(model: str, **kwargs):
elif model == 'llama':
pretrain_path = kwargs["pretrain"]
tokenizer = AutoTokenizer.from_pretrained(pretrain_path)
elif model == 'roberta':
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
else:
raise ValueError(f'Unsupported model "{model}"')

Expand Down
3 changes: 0 additions & 3 deletions applications/Chat/coati/utils/__init__.py

This file was deleted.

73 changes: 0 additions & 73 deletions applications/Chat/coati/utils/tokenizer_utils.py

This file was deleted.

Loading