Mostly by jamesthesnake · Pull Request #175 · jamesthesnake/ColossalAI

jamesthesnake · 2023-09-28T00:19:32Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

…aitech#4450) * [shardformer/sequence parallel] Support sequence parallel for gpt2 (hpcaitech#4384) * [sequence parallel] add sequence parallel linear col/row support (hpcaitech#4336) * add sequence parallel linear col/row support * add annotation * add annotation * add support for gpt2 fused qkv linear layer * support sequence parallel in GPT2 * add docstring and note * add requirments * remove unused flash-attb * modify flash attn test * modify flash attn setting * modify flash attn code * add assert before divide, rename forward function * [shardformer/test] fix gpt2 test with seq-parallel * [shardformer/sequence parallel] Overlap input gather and grad computation during col backward (hpcaitech#4401) * overlap gather input / grad computing during col backward * modify test for overlap * simplify code * fix code and modify cuda stream synchronize * [shardformer/sequence parallel] polish code

…4446) * support DDP for HybridPlugin/add tp+dp tests * add docstring for HybridParallelPlugin

* [test] remove cpu marker * [test] remove gpu marker * [test] update pytest markers * [ci] update unit test ci

* support interleaved pipeline * fix unit test * remove virtual stage test in stage mgr * add droped type hint and updated bwd

…tp (hpcaitech#4460) * support gpt2 seq parallel with pp/dp/tp * fix a bug when waiting for stream done * delete unused gpt2_seq file

[shardformer] bloom support sequence parallel

* [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel * [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel * [shardformer] bert support sequence parallel

* add some base tests and policies * finish whisper base model * add conditional generation * finish basic tests * whisper * finish whisper * finish whisper * del useless whisper test * fix * add argmin to replace * finish revision

* support tp+zero/input type cast for hybridplugin * add tp+zero tests * fix bucket arguments

…warning and fix a bug in gpt2 pp (hpcaitech#4488)

* [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix

…ome fix. (hpcaitech#4498) * [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * activate checks

* [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * activate checks * [Test] test ci * test ci * test ci * test ci * test ci * test ci * test ci * fix

…lelPlugin (hpcaitech#4506) * add APIs * implement save_sharded_model * add test for hybrid checkpointio * implement naive loading for sharded model * implement efficient sharded model loading * open a new file for hybrid checkpoint_io * small fix * fix circular importing * fix docstring * arrange arguments and apis * small fix

* pause * finish pp+zero1 * Update test_shard_vit.py

…on in shardco… (hpcaitech#4516) * fix overlap bug and support bert, add overlap as an option in shardconfig * support overlap for chatglm and bloom

…#4526)

* add overlap support for gpt2 * remove unused code * remove unused code

* [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix

* [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1

…ch#4544)

…lPlugin (hpcaitech#4540) * implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp

…arallelPlugin (hpcaitech#4575) * hybrid plugin support huggingface from_pretrained * add huggingface compatibility tests * add folder cleaning * fix bugs

* pytree test * test bert * test bert * test bert * revise * add register * add register

…pcaitech#4584) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * [shardformer] fix opt test hanging * fix * test * test * [shardformer] zero1+pp and the corresponding tests (hpcaitech#4517) * pause * finish pp+zero1 * Update test_shard_vit.py * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (hpcaitech#4516) * fix overlap bug and support bert, add overlap as an option in shardconfig * support overlap for chatglm and bloom * [shardformer] fix emerged bugs after updating transformers (hpcaitech#4526) * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] Add overlap support for gpt2 (hpcaitech#4535) * add overlap support for gpt2 * remove unused code * remove unused code * [shardformer] support pp+tp+zero1 tests (hpcaitech#4531) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] fix submodule replacement bug when enabling pp (hpcaitech#4544) * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (hpcaitech#4540) * implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * rebase feature/shardformer * update pipeline * [shardformer] fix * [shardformer] fix * [shardformer] bert finetune fix * [shardformer] add all_reduce operation to loss add all_reduce operation to loss * [shardformer] make compatible with pytree. make compatible with pytree. * [shardformer] disable tp disable tp * [shardformer] add 3d plugin to ci test * [shardformer] update num_microbatches to None * [shardformer] update microbatchsize * [shardformer] update assert * update scheduler * update scheduler --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>

…pcaitech#4606)

* [doc] clean up outdated docs * [doc] fix linking * [doc] fix linking

* feat: modify lora merge weights fn * feat: add lora merge weights config

* [lazy] support _like methods and clamp * [lazy] pass transformers models * [lazy] fix device move and requires grad * [lazy] fix requires grad and refactor api * [lazy] fix requires grad

… cmd. (hpcaitech#4713) * Fix the version check bug in colossalai run when generating the cmd. * polish code

* [gptq] add gptq kernel (hpcaitech#4416) * add gptq * refactor code * fix tests * replace auto-gptq * rname inferance/quant * refactor test * add auto-gptq as an option * reset requirements * change assert and check auto-gptq * add import warnings * change test flash attn version * remove example * change requirements of flash_attn * modify tests * [skip ci] change requirements-test * [gptq] faster gptq cuda kernel (hpcaitech#4494) * [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci] * [gptq] add gptq tensor parallel (hpcaitech#4538) * add gptq tensor parallel * add gptq tp * delete print * add test gptq check * add test auto gptq check * [gptq] combine gptq and kv cache manager (hpcaitech#4706) * combine gptq and kv cache manager * add init bits * delete useless code * add model path * delete usless print and update test * delete usless import * move option gptq to shard config * change replace linear to shardformer * update bloom policy * delete useless code * fix import bug and delete uselss code * change colossalai/gptq to colossalai/quant/gptq * update import linear for tests * delete useless code and mv gptq_kernel to kernel directory * fix triton kernel * add triton import

* add chatglm2 * add * gather needed kernels * fix some bugs * finish context forward * finish context stage * fix * add * pause * add * fix bugs * finish chatglm * fix bug * change some logic * fix bugs * change some logics * add * add * add * fix * fix tests * fix

* [release] update version * [doc] revert versions

* Add ColossalEval * Delete evaluate in Chat --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example * [fix] fix weekly runing example

* fix example format in docstring * polish shardformer doc

…caitech#4774) * support unsharded saving/loading for model * support optimizer unsharded saving * update doc * support unsharded loading for optimizer * small fix

* [lazy] patch from pretrained * [lazy] fix from pretrained and add tests * [devops] update ci

[doc] Update TODO in README of Colossal-LLaMA-2

change filename: pretraining.py -> trainin.py there is no file named pretraing.py. wrong writing

hpcaitech#4602) Co-authored-by: github-actions <github-actions@github.com>

* [chat] fix gemini strategy * [chat] fix gemini strategy * [chat] fix gemini strategy * [chat] fix gemini strategy * g# This is a combination of 2 commits. [chat] fix gemini strategy fox * [chat] fix gemini strategy update llama2 example [chat] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * fix * fix * fix * fix * fix * Update train_prompts.py

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

l

FoolPlayer and others added 30 commits August 16, 2023 15:41

[shardformer] support DDP in HybridPlugin/add tp+dp tests (hpcaitech#…

6ef33f7

…4446) * support DDP for HybridPlugin/add tp+dp tests * add docstring for HybridParallelPlugin

[devops] add large-scale distributed test marker (hpcaitech#4452)

26e29d5

* [test] remove cpu marker * [test] remove gpu marker * [test] update pytest markers * [ci] update unit test ci

[shardformer] support interleaved pipeline (hpcaitech#4448)

a78daf6

* support interleaved pipeline * fix unit test * remove virtual stage test in stage mgr * add droped type hint and updated bwd

[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/…

7c8be77

…tp (hpcaitech#4460) * support gpt2 seq parallel with pp/dp/tp * fix a bug when waiting for stream done * delete unused gpt2_seq file

[shardformer] bloom support sequence parallel (hpcaitech#4465)

0ecd71e

[shardformer] bloom support sequence parallel

[shardformer] Pipeline/whisper (hpcaitech#4456)

8739aa7

* add some base tests and policies * finish whisper base model * add conditional generation * finish basic tests * whisper * finish whisper * finish whisper * del useless whisper test * fix * add argmin to replace * finish revision

[shardformer] support tp+zero for shardformer (hpcaitech#4472)

1c7df56

* support tp+zero/input type cast for hybridplugin * add tp+zero tests * fix bucket arguments

rename chatglm to chatglm2 (hpcaitech#4484)

5545114

[shardformer/sequence parallel] not support opt of seq-parallel, add …

351351a

…warning and fix a bug in gpt2 pp (hpcaitech#4488)

[shardformer] tests for 3d parallel (hpcaitech#4493)

e04436a

[shardformer] zero1+pp and the corresponding tests (hpcaitech#4517)

376533a

* pause * finish pp+zero1 * Update test_shard_vit.py

[shardformer/fix overlap bug] fix overlap bug, add overlap as an opti…

c554b7f

…on in shardco… (hpcaitech#4516) * fix overlap bug and support bert, add overlap as an option in shardconfig * support overlap for chatglm and bloom

[shardformer] fix emerged bugs after updating transformers (hpcaitech…

0387a47

…#4526)

[shardformer] Add overlap support for gpt2 (hpcaitech#4535)

e241b74

* add overlap support for gpt2 * remove unused code * remove unused code

[shardformer] fix opt test hanging (hpcaitech#4521)

d367b88

* [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix

[shardformer] fix submodule replacement bug when enabling pp (hpcaite…

2c787d7

…ch#4544)

[shardformer] support from_pretrained when loading model with HybridP…

38ccb8b

…arallelPlugin (hpcaitech#4575) * hybrid plugin support huggingface from_pretrained * add huggingface compatibility tests * add folder cleaning * fix bugs

[pipeline] 1f1b schedule receive microbatch size (hpcaitech#4589)

508ca36

[shardformer] Pytree fix (hpcaitech#4533)

24c0768

* pytree test * test bert * test bert * test bert * revise * add register * add register

[checkpointio] support huggingface from_pretrained for all plugins (h…

e79b1e8

…pcaitech#4606)

Merge branch 'main' into feature/shardformer

a39a5c6

ver217 and others added 29 commits September 21, 2023 11:36

[doc] clean up outdated docs (hpcaitech#4765)

66f3926

* [doc] clean up outdated docs * [doc] fix linking * [doc] fix linking

[doc] add shardformer doc to sidebar (hpcaitech#4768)

493a5ef

[chat]: add lora merge weights config (hpcaitech#4766)

901ab1e

* feat: modify lora merge weights fn * feat: add lora merge weights config

[lazy] support torch 2.0 (hpcaitech#4763)

3e05c07

* [lazy] support _like methods and clamp * [lazy] pass transformers models * [lazy] fix device move and requires grad * [lazy] fix requires grad and refactor api * [lazy] fix requires grad

[bug] Fix the version check bug in colossalai run when generating the…

1e0e080

… cmd. (hpcaitech#4713) * Fix the version check bug in colossalai run when generating the cmd. * polish code

[release] update version (hpcaitech#4775)

4146f1c

* [release] update version * [doc] revert versions

initial commit: add colossal llama 2 (hpcaitech#4784)

74aa7d9

[feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

ce77785

* Add ColossalEval * Delete evaluate in Chat --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Tong Li <tong.li352711588@gmail.com>

[doc] add llama2 domain-specific solution news (hpcaitech#4789)

d512a4d

* [doc] add llama2 domain-specific solution news

[fix] fix weekly runing example (hpcaitech#4787)

26cd6d8

* [fix] fix weekly runing example * [fix] fix weekly runing example

[doc] polish shardformer doc (hpcaitech#4779)

a2db755

* fix example format in docstring * polish shardformer doc

[checkpointio] support unsharded checkpointIO for hybrid parallel (hp…

64a08b2

…caitech#4774) * support unsharded saving/loading for model * support optimizer unsharded saving * update doc * support unsharded loading for optimizer * small fix

update readme

bd01467

[lazy] support from_pretrained (hpcaitech#4801)

4965c0d

* [lazy] patch from pretrained * [lazy] fix from pretrained and add tests * [devops] update ci

update

8cbce61

Merge pull request hpcaitech#4805 from TongLi3701/docs/fix

62b6af1

[doc] Update TODO in README of Colossal-LLaMA-2

[hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

b6cf0ac

change filename: pretraining.py -> trainin.py there is no file named pretraing.py. wrong writing

[misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

a227063

[doc] add lazy init docs (hpcaitech#4808)

da15fdb

[hotfix] fix norm type error in zero optimizer (hpcaitech#4795)

54b3ad8

[hotfix] Correct several erroneous code comments (hpcaitech#4794)

11f1e42

[format] applied code formatting on changed files in pull request 4595 (

fb46d05

hpcaitech#4602) Co-authored-by: github-actions <github-actions@github.com>

fix format (hpcaitech#4815)

bbbcac2

Update Qwen-7B results (hpcaitech#4821)

1fa8c5e

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

[doc] update slack link (hpcaitech#4823)

822051d

Merge pull request #173 from hpcaitech/main

be3cdef

l

jamesthesnake merged commit c398232 into co Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mostly#175

Mostly#175
jamesthesnake merged 122 commits intocofrom
most

jamesthesnake commented Sep 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

jamesthesnake commented Sep 28, 2023

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants