Skip to content
Merged

l #173

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
8844691
[shardformer] update shardformer readme (#4689)
flybird11111 Sep 12, 2023
d8ceeac
[hotfix] fix typo in hybrid parallel io (#4697)
Sep 12, 2023
9c2feb2
fix some typo with colossalai/device colossalai/tensor/ etc. (#4171)
digger-yu Sep 12, 2023
068372a
[doc] add potential solution for OOM in llama2 example (#4699)
Sep 13, 2023
c7d6975
[shardformer] fix GPT2DoubleHeadsModel (#4703)
flybird11111 Sep 13, 2023
e2c0e7f
[hotfix] Fix import error: colossal.kernel without triton installed (…
yuanheng-zhao Sep 14, 2023
20190b4
[shardformer] to fix whisper test failed due to significant accuracy …
flybird11111 Sep 14, 2023
ce97790
[doc] fix llama2 code link (#4726)
binmakeswell Sep 14, 2023
f911d5b
[doc] Add user document for Shardformer (#4702)
Sep 15, 2023
8c2dda7
[format] applied code formatting on changed files in pull request 472…
github-actions[bot] Sep 15, 2023
50e5602
[doc] add shardformer support matrix/update tensor parallel documents…
Sep 15, 2023
e4fc57c
Optimized some syntax errors in the documentation and code under appl…
digger-yu Sep 15, 2023
4616263
[shardformer] update pipeline parallel document (#4725)
flybird11111 Sep 15, 2023
cd4e61d
[legacy] remove deterministic data loader test
ppt0011 Sep 15, 2023
6a03c93
[shardformer] update seq parallel document (#4730)
FoolPlayer Sep 15, 2023
608cffa
[example] add gpt2 HybridParallelPlugin example (#4653)
FoolPlayer Sep 15, 2023
73eb3e8
Merge pull request #4738 from ppt0011/main
ppt0011 Sep 15, 2023
451c346
[doc] polish shardformer doc (#4735)
Sep 15, 2023
ac27979
[shardformer] add custom policy in hybrid parallel plugin (#4718)
oahzxl Sep 15, 2023
4c4482f
[example] llama2 add fine-tune example (#4673)
flybird11111 Sep 15, 2023
d151dca
[doc] explaination of loading large pretrained models (#4741)
Sep 15, 2023
32e7f99
[kernel] update triton init #4740 (#4740)
oahzxl Sep 18, 2023
b5f9e37
[legacy] clean up legacy code (#4743)
ver217 Sep 18, 2023
3c6b831
[format] applied code formatting on changed files in pull request 474…
github-actions[bot] Sep 18, 2023
079bf3c
[misc] update pre-commit and run all files (#4752)
ver217 Sep 19, 2023
10513f2
[doc] explain suitable use case for each plugin
ppt0011 Sep 19, 2023
a04337b
[doc] put individual plugin explanation in front
ppt0011 Sep 19, 2023
e10d9f0
[doc] add model examples for each plugin
ppt0011 Sep 19, 2023
4d7537b
[doc] put native colossalai plugins first in description section
ppt0011 Sep 20, 2023
07c2e3d
Merge pull request #4757 from ppt0011/main
ppt0011 Sep 20, 2023
7b9b864
[chat]: update rm, add wandb and fix bugs (#4471)
cwher Sep 20, 2023
c0a0337
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapp…
Sep 20, 2023
df66741
[bug] fix get_default_parser in examples (#4764)
Sep 21, 2023
66f3926
[doc] clean up outdated docs (#4765)
ver217 Sep 21, 2023
493a5ef
[doc] add shardformer doc to sidebar (#4768)
Sep 21, 2023
901ab1e
[chat]: add lora merge weights config (#4766)
cwher Sep 21, 2023
3e05c07
[lazy] support torch 2.0 (#4763)
ver217 Sep 21, 2023
1e0e080
[bug] Fix the version check bug in colossalai run when generating the…
littsk Sep 22, 2023
946ab56
[feature] add gptq for inference (#4754)
Xu-Kai Sep 22, 2023
ce7ade3
[inference] chatglm2 infer demo (#4724)
CjhHa1 Sep 22, 2023
4146f1c
[release] update version (#4775)
ver217 Sep 22, 2023
74aa7d9
initial commit: add colossal llama 2 (#4784)
TongLi3701 Sep 24, 2023
ce77785
[feature] ColossalEval: Evaluation Pipeline for LLMs (#4786)
chengeharrison Sep 24, 2023
d512a4d
[doc] add llama2 domain-specific solution news (#4789)
binmakeswell Sep 25, 2023
26cd6d8
[fix] fix weekly runing example (#4787)
flybird11111 Sep 25, 2023
a2db755
[doc] polish shardformer doc (#4779)
Sep 26, 2023
64a08b2
[checkpointio] support unsharded checkpointIO for hybrid parallel (#4…
Sep 26, 2023
bd01467
update readme
TongLi3701 Sep 26, 2023
4965c0d
[lazy] support from_pretrained (#4801)
ver217 Sep 26, 2023
8cbce61
update
TongLi3701 Sep 26, 2023
62b6af1
Merge pull request #4805 from TongLi3701/docs/fix
Desperado-Jia Sep 26, 2023
b6cf0ac
[hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800)
Chandler-Bing Sep 26, 2023
a227063
[misc] add last_epoch in CosineAnnealingWarmupLR (#4778)
hova88 Sep 26, 2023
da15fdb
[doc] add lazy init docs (#4808)
ver217 Sep 27, 2023
54b3ad8
[hotfix] fix norm type error in zero optimizer (#4795)
littsk Sep 27, 2023
11f1e42
[hotfix] Correct several erroneous code comments (#4794)
littsk Sep 27, 2023
fb46d05
[format] applied code formatting on changed files in pull request 459…
github-actions[bot] Sep 27, 2023
bbbcac2
fix format (#4815)
TongLi3701 Sep 27, 2023
be400a0
[chat] fix gemini strategy (#4698)
flybird11111 Sep 27, 2023
1fa8c5e
Update Qwen-7B results (#4821)
chengeharrison Sep 27, 2023
822051d
[doc] update slack link (#4823)
binmakeswell Sep 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
22 changes: 0 additions & 22 deletions .flake8

This file was deleted.

2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
blank_issues_enabled: true
contact_links:
- name: ❓ Simple question - Slack Chat
url: https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w
url: https://github.com/hpcaitech/public_assets/tree/main/colossalai/contact/slack
about: This issue tracker is not for technical support. Please use our Slack chat, and ask the community for help.
- name: ❓ Simple question - WeChat
url: https://github.com/hpcaitech/ColossalAI/blob/main/docs/images/WeChat.png
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/build_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ jobs:
runs-on: [self-hosted, gpu]
container:
image: hpcaitech/pytorch-cuda:1.12.0-11.3.0
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
timeout-minutes: 60
defaults:
run:
Expand Down Expand Up @@ -214,6 +214,7 @@ jobs:
NCCL_SHM_DISABLE: 1
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
TESTMON_CORE_PKGS: /__w/ColossalAI/ColossalAI/requirements/requirements.txt,/__w/ColossalAI/ColossalAI/requirements/requirements-test.txt
LLAMA_PATH: /data/scratch/llama-tiny

- name: Store Testmon Cache
run: |
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/build_on_schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: [self-hosted, 8-gpu]
container:
image: hpcaitech/pytorch-cuda:1.12.0-11.3.0
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
timeout-minutes: 40
steps:
- name: Check GPU Availability # ensure all GPUs have enough memory
Expand Down Expand Up @@ -64,6 +64,7 @@ jobs:
env:
DATA: /data/scratch/cifar-10
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
LLAMA_PATH: /data/scratch/llama-tiny

- name: Notify Lark
id: message-preparation
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/compatiblity_test_on_dispatch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ jobs:
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
container:
image: ${{ matrix.container }}
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
timeout-minutes: 120
steps:
- name: Install dependencies
Expand Down Expand Up @@ -92,3 +92,4 @@ jobs:
DATA: /data/scratch/cifar-10
NCCL_SHM_DISABLE: 1
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
LLAMA_PATH: /data/scratch/llama-tiny
3 changes: 2 additions & 1 deletion .github/workflows/compatiblity_test_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
container:
image: ${{ matrix.container }}
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
timeout-minutes: 120
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-test-${{ matrix.container }}
Expand Down Expand Up @@ -87,3 +87,4 @@ jobs:
DATA: /data/scratch/cifar-10
NCCL_SHM_DISABLE: 1
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
LLAMA_PATH: /data/scratch/llama-tiny
3 changes: 2 additions & 1 deletion .github/workflows/compatiblity_test_on_schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
container:
image: ${{ matrix.container }}
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
timeout-minutes: 120
steps:
- name: Install dependencies
Expand Down Expand Up @@ -85,6 +85,7 @@ jobs:
DATA: /data/scratch/cifar-10
NCCL_SHM_DISABLE: 1
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
LLAMA_PATH: /data/scratch/llama-tiny

- name: Notify Lark
id: message-preparation
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/doc_test_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ jobs:
- name: Install ColossalAI
run: |
source activate pytorch
pip install -v .
CUDA_EXT=1 pip install -v .

- name: Test the Doc
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/doc_test_on_schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:

- name: Install ColossalAI
run: |
pip install -v .
CUDA_EXT=1 pip install -v .

- name: Install Doc Test Requirements
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/example_check_on_dispatch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
uses: actions/checkout@v3
- name: Install Colossal-AI
run: |
pip install -v .
CUDA_EXT=1 pip install -v .
- name: Test the example
run: |
dir=${{ matrix.directory }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/example_check_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ jobs:

- name: Install Colossal-AI
run: |
pip install -v .
CUDA_EXT=1 pip install -v .

- name: Test the example
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/example_check_on_schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ jobs:

- name: Install Colossal-AI
run: |
pip install -v .
CUDA_EXT=1 pip install -v .

- name: Traverse all files
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/run_chatgpt_examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,5 @@ jobs:
NCCL_SHM_DISABLE: 1
MAX_JOBS: 8
SFT_DATASET: /data/scratch/github_actions/chat/data.json
PROMPT_PATH: /data/scratch/github_actions/chat/prompts_en.jsonl
PROMPT_DATASET: /data/scratch/github_actions/chat/prompts_en.jsonl
PRETRAIN_DATASET: /data/scratch/github_actions/chat/alpaca_data.json
12 changes: 6 additions & 6 deletions .github/workflows/scripts/check_doc_i18n.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ def compare_dirs(dir1, dir2):

# If the corresponding item doesn't exist in the second directory, the directories are different
if not os.path.exists(item_path2):
print(f'Found mismatch: {item_path1}, {item_path2}')
print(f"Found mismatch: {item_path1}, {item_path2}")
return False

# If the corresponding item is a directory, we compare the two directories recursively
if os.path.isdir(item_path1) and os.path.isdir(item_path2):
if not compare_dirs(item_path1, item_path2):
print(f'Found mismatch: {item_path1}, {item_path2}')
print(f"Found mismatch: {item_path1}, {item_path2}")
return False

# both are files
Expand All @@ -37,16 +37,16 @@ def compare_dirs(dir1, dir2):

# If the corresponding item is not a file or a directory, the directories are different
else:
print(f'Found mismatch: {item_path1}, {item_path2}')
print(f"Found mismatch: {item_path1}, {item_path2}")
return False

# If all items are the same, the directories are the same
return True


if __name__ == '__main__':
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('-d', '--directory', help="The directory where the multi-language source files are kept.")
parser.add_argument("-d", "--directory", help="The directory where the multi-language source files are kept.")
args = parser.parse_args()

i18n_folders = os.listdir(args.directory)
Expand All @@ -56,7 +56,7 @@ def compare_dirs(dir1, dir2):
for i in range(1, len(i18n_folders)):
dir1 = i18n_folders[0]
dir2 = i18n_folders[i]
print(f'comparing {dir1} vs {dir2}')
print(f"comparing {dir1} vs {dir2}")
match = compare_dirs(i18n_folders[0], i18n_folders[i])

if not match:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,24 @@

def check_inputs(input_list):
for path in input_list:
real_path = os.path.join('examples', path)
real_path = os.path.join("examples", path)
if not os.path.exists(real_path):
return False
return True


def main():
parser = argparse.ArgumentParser()
parser.add_argument('-f', '--fileNameList', type=str, help="List of file names")
parser.add_argument("-f", "--fileNameList", type=str, help="List of file names")
args = parser.parse_args()
name_list = args.fileNameList.split(",")
is_correct = check_inputs(name_list)

if is_correct:
print('success')
print("success")
else:
print('failure')
print("failure")


if __name__ == '__main__':
if __name__ == "__main__":
main()
10 changes: 5 additions & 5 deletions .github/workflows/scripts/example_checks/check_example_weekly.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,21 @@ def show_files(path, all_files):


def join(input_list, sep=None):
return (sep or ' ').join(input_list)
return (sep or " ").join(input_list)


def main():
contents = show_files('examples/', [])
contents = show_files("examples/", [])
all_loc = []
for file_loc in contents:
split_loc = file_loc.split('/')
split_loc = file_loc.split("/")
# must have two sub-folder levels after examples folder, such as examples/images/vit is acceptable, examples/images/README.md is not, examples/requirements.txt is not.
if len(split_loc) >= 4:
re_loc = '/'.join(split_loc[1:3])
re_loc = "/".join(split_loc[1:3])
if re_loc not in all_loc:
all_loc.append(re_loc)
print(all_loc)


if __name__ == '__main__':
if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

def main():
parser = argparse.ArgumentParser()
parser.add_argument('-f', '--fileNameList', type=str, help="The list of changed files")
parser.add_argument("-f", "--fileNameList", type=str, help="The list of changed files")
args = parser.parse_args()
name_list = args.fileNameList.split(":")
folder_need_check = set()
Expand All @@ -15,10 +15,10 @@ def main():
# - application
# - file
if loc.split("/")[0] == "examples" and len(loc.split("/")) >= 4:
folder_need_check.add('/'.join(loc.split("/")[1:3]))
folder_need_check.add("/".join(loc.split("/")[1:3]))
# Output the result using print. Then the shell can get the values.
print(list(folder_need_check))


if __name__ == '__main__':
if __name__ == "__main__":
main()
Loading