[shardformer] support all GPT-J auto sharding except lazy init by ppt0011 · Pull Request #4721 · hpcaitech/ColossalAI

ppt0011 · 2023-09-14T09:46:28Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

update GPT-J auto sharding to support all cases except lazy init

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

…HybridParallelPlugin (hpcaitech#4624) * Enable policy assignment in HybridPlugin and enable llama policy for llamav2 * Remove Policy from Plugin * revert changes of plugin HybridParallelModule * revert changes in plugin * upgrade transformers * revert transformers version --------- Co-authored-by: flybird11111 <1829166702@qq.com>

) * set optimizer to optional in execute_pipeline * arrange device and mixed precision in booster init * fix execute_pipeline in booster.py

* update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar

* [devops] fix concurrency group * [devops] fix compatibility test * [devops] fix tensornvme install * [devops] fix tensornvme install * [devops] fix colossalai install

hpcaitech#4645) * [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix

…aitech#4671) * [legacy] move communication to legacy (hpcaitech#4640) * [legacy] refactor logger and clean up legacy codes (hpcaitech#4654) * [legacy] make logger independent to gpc * [legacy] make optim independent to registry * [legacy] move test engine to legacy * [legacy] move nn to legacy (hpcaitech#4656) * [legacy] move nn to legacy * [checkpointio] fix save hf config * [test] remove useledd rpc pp test * [legacy] fix nn init * [example] skip tutorial hybriad parallel example * [devops] test doc check * [devops] test doc check

* [shardformer]fix gpt2 test [shardformer]fix gpt2 test [shardformer]fix gpt2 test * fix * [shardformer] add todo * [shardformer] add todo

…nd related kernels for our inference system (hpcaitech#4577) * [infer] Infer/llama demo (hpcaitech#4503) * add * add infer example * finish * finish * stash * fix * [Kernels] add inference token attention kernel (hpcaitech#4505) * add token forward * fix tests * fix comments * add try import triton * add adapted license * add tests check * [Kernels] add necessary kernels (llama & bloom) for attention forward and kv-cache manager (hpcaitech#4485) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * combine codes (hpcaitech#4509) * [feature] add KV cache manager for llama & bloom inference (hpcaitech#4495) * add kv cache memory manager * add stateinfo during inference * format * format * rename file * add kv cache test * revise on BatchInferState * file dir change * [Bug FIx] import llama context ops fix (hpcaitech#4524) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * fix * add ops into init.py * add * [Infer] Add TPInferEngine and fix file path (hpcaitech#4532) * add engine for TP inference * move file path * update path * fix TPInferEngine * remove unused file * add engine test demo * revise TPInferEngine * fix TPInferEngine, add test * fix * Add Inference test for llama (hpcaitech#4508) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py --------- Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com> * [infer] Add Bloom inference policy and replaced methods (hpcaitech#4512) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * Revert "[infer] Add Bloom inference policy and replaced methods (hpcaitech#4512)" (hpcaitech#4552) This reverts commit 17cfa57. * [Doc] Add colossal inference doc (hpcaitech#4549) * create readme * add readme.md * fix typos * [infer] Add Bloom inference policy and replaced methods (hpcaitech#4553) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * trivial * Fix Bugs In Llama Model Forward (hpcaitech#4550) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py * bug fix: fix bugs about infer_state.is_context_stage * remove pollcies * fix: delete unused code * fix: delete unused code * remove unused coda * fix conflict --------- Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com> * [doc] add colossal inference fig (hpcaitech#4554) * create readme * add readme.md * fix typos * upload fig * [NFC] fix docstring for colossal inference (hpcaitech#4555) Fix docstring and comments in kv cache manager and bloom modeling * fix docstring in llama modeling (hpcaitech#4557) * [Infer] check import vllm (hpcaitech#4559) * change import vllm * import apply_rotary_pos_emb * change import location * [DOC] add installation req (hpcaitech#4561) * add installation req * fix * slight change * remove empty * [Feature] rms-norm transfer into inference llama.py (hpcaitech#4563) * add installation req * fix * slight change * remove empty * add rmsnorm polciy * add * clean codes * [infer] Fix tp inference engine (hpcaitech#4564) * fix engine prepare data * add engine test * use bloom for testing * revise on test * revise on test * reset shardformer llama (hpcaitech#4569) * [infer] Fix engine - tensors on different devices (hpcaitech#4570) * fix diff device in engine * [codefactor] Feature/colossal inference (hpcaitech#4579) * code factors * remove * change coding (hpcaitech#4581) * [doc] complete README of colossal inference (hpcaitech#4585) * complete fig * Update README.md * [doc]update readme (hpcaitech#4586) * update readme * Update README.md * bug fix: fix bus in llama and bloom (hpcaitech#4588) * [BUG FIX]Fix test engine in CI and non-vllm kernels llama forward (hpcaitech#4592) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * [Kernel]Rmsnorm fix (hpcaitech#4598) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * add triton rmsnorm * delete vllm kernel flag * [Bug Fix]Fix bugs in llama (hpcaitech#4601) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * bug fix: remove rotary_positions_ids --------- Co-authored-by: cuiqing.li <lixx3527@gmail.com> * [kernel] Add triton layer norm & replace norm for bloom (hpcaitech#4609) * add layernorm for inference * add test for layernorm kernel * add bloom layernorm replacement policy * trivial: path * [Infer] Bug fix rotary embedding in llama (hpcaitech#4608) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * [bench] Add bloom inference benchmark (hpcaitech#4621) * add bloom benchmark * readme - update benchmark res * trivial - uncomment for testing (hpcaitech#4622) * [Infer] add check triton and cuda version for tests (hpcaitech#4627) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * Update sharder.py (hpcaitech#4629) * [Inference] Hot fix some bugs and typos (hpcaitech#4632) * fix * fix test * fix conflicts * [typo]Comments fix (hpcaitech#4633) * fallback * fix commnets * bug fix: fix some bugs in test_llama and test_bloom (hpcaitech#4635) * [Infer] delete benchmark in tests and fix bug for llama and bloom (hpcaitech#4636) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * delete benchmark and fix infer bugs * delete benchmark for tests * delete useless code * delete bechmark function in utils * [Fix] Revise TPInferEngine, inference tests and benchmarks (hpcaitech#4642) * [Fix] revise TPInferEngine methods and inference tests * fix llama/bloom infer benchmarks * fix infer tests * trivial fix: benchmakrs * trivial * trivial: rm print * modify utils filename for infer ops test (hpcaitech#4657) * [Infer] Fix TPInferEngine init & inference tests, benchmarks (hpcaitech#4670) * fix engine funcs * TPInferEngine: receive shard config in init * benchmarks: revise TPInferEngine init * benchmarks: remove pytest decorator * trivial fix * use small model for tests * [NFC] use args for infer benchmarks (hpcaitech#4674) * revise infer default (hpcaitech#4683) * [Fix] optimize/shard model in TPInferEngine init (hpcaitech#4684) * remove using orig model in engine * revise inference tests * trivial: rename --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Xu Kai <xukai16@foxmail.com> Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com> Co-authored-by: yuehuayingxueluo <867460659@qq.com> Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>

* update booster_api.md * update booster_checkpoint.md * update booster_plugins.md * move transformers importing inside function * fix Dict typing * fix autodoc bug * small fix

* [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme

…ch#4171) Co-authored-by: flybird11111 <1829166702@qq.com>

…pcaitech#4722) * [hotfix] remove triton kernels from kernel init * revise bloom/llama kernel imports for infer

…differences. (hpcaitech#4710) * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed

* [doc] fix llama2 code link * [doc] fix llama2 code link * [doc] fix llama2 code link

* create shardformer doc files * add docstring for seq-parallel * update ShardConfig docstring * add links to llama example * add outdated massage * finish introduction & supporting information * finish 'how shardformer works' * finish shardformer.md English doc * fix doctest fail * add Chinese document

hpcaitech#4727) Co-authored-by: github-actions <github-actions@github.com>

…hpcaitech#4728) * add compatibility matrix for shardformer doc * update tp doc

…ications/ (hpcaitech#4127) Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document

* update doc of seq parallel * fix typo

* add gpt2 HybridParallelPlugin example * update readme and testci * update test ci * fix test_ci bug * update requirements * add requirements * update requirements * add requirement * rename file

[legacy] remove deterministic data loader test

* arrange position of chapters * fix typos in seq parallel doc

…4718) * add custom policy * update assert

* [lazy] support _like methods and clamp * [lazy] pass transformers models * [lazy] fix device move and requires grad * [lazy] fix requires grad and refactor api * [lazy] fix requires grad

… cmd. (hpcaitech#4713) * Fix the version check bug in colossalai run when generating the cmd. * polish code

* [gptq] add gptq kernel (hpcaitech#4416) * add gptq * refactor code * fix tests * replace auto-gptq * rname inferance/quant * refactor test * add auto-gptq as an option * reset requirements * change assert and check auto-gptq * add import warnings * change test flash attn version * remove example * change requirements of flash_attn * modify tests * [skip ci] change requirements-test * [gptq] faster gptq cuda kernel (hpcaitech#4494) * [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci] * [gptq] add gptq tensor parallel (hpcaitech#4538) * add gptq tensor parallel * add gptq tp * delete print * add test gptq check * add test auto gptq check * [gptq] combine gptq and kv cache manager (hpcaitech#4706) * combine gptq and kv cache manager * add init bits * delete useless code * add model path * delete usless print and update test * delete usless import * move option gptq to shard config * change replace linear to shardformer * update bloom policy * delete useless code * fix import bug and delete uselss code * change colossalai/gptq to colossalai/quant/gptq * update import linear for tests * delete useless code and mv gptq_kernel to kernel directory * fix triton kernel * add triton import

* add chatglm2 * add * gather needed kernels * fix some bugs * finish context forward * finish context stage * fix * add * pause * add * fix bugs * finish chatglm * fix bug * change some logic * fix bugs * change some logics * add * add * add * fix * fix tests * fix

* [release] update version * [doc] revert versions

* Add ColossalEval * Delete evaluate in Chat --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example * [fix] fix weekly runing example

* fix example format in docstring * polish shardformer doc

…caitech#4774) * support unsharded saving/loading for model * support optimizer unsharded saving * update doc * support unsharded loading for optimizer * small fix

* [lazy] patch from pretrained * [lazy] fix from pretrained and add tests * [devops] update ci

[doc] Update TODO in README of Colossal-LLaMA-2

change filename: pretraining.py -> trainin.py there is no file named pretraing.py. wrong writing

ppt0011 · 2023-09-27T10:03:27Z

moved to #4825

ver217 and others added 30 commits September 6, 2023 23:41

[release] update version (hpcaitech#4623)

9709b8f

[pipeline] set optimizer to optional in execute_pipeline (hpcaitech#4630

660eed9

) * set optimizer to optional in execute_pipeline * arrange device and mixed precision in booster init * fix execute_pipeline in booster.py

[devops] fix concurrency group and compatibility test (hpcaitech#4665)

a686f9d

* [devops] fix concurrency group * [devops] fix compatibility test * [devops] fix tensornvme install * [devops] fix tensornvme install * [devops] fix colossalai install

[devops] fix concurrency group (hpcaitech#4667)

536397c

[shardformer]fix gpt2 double head (hpcaitech#4663)

eedaa3e

* [shardformer]fix gpt2 test [shardformer]fix gpt2 test [shardformer]fix gpt2 test * fix * [shardformer] add todo * [shardformer] add todo

[doc] Update booster user documents. (hpcaitech#4669)

1d45473

* update booster_api.md * update booster_checkpoint.md * update booster_plugins.md * move transformers importing inside function * fix Dict typing * fix autodoc bug * small fix

[shardformer] update shardformer readme (hpcaitech#4689)

8844691

* [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme

[hotfix] fix typo in hybrid parallel io (hpcaitech#4697)

d8ceeac

fix some typo with colossalai/device colossalai/tensor/ etc. (hpcaite…

9c2feb2

…ch#4171) Co-authored-by: flybird11111 <1829166702@qq.com>

[doc] add potential solution for OOM in llama2 example (hpcaitech#4699)

068372a

[shardformer] fix GPT2DoubleHeadsModel (hpcaitech#4703)

c7d6975

[hotfix] Fix import error: colossal.kernel without triton installed (h…

e2c0e7f

…pcaitech#4722) * [hotfix] remove triton kernels from kernel init * revise bloom/llama kernel imports for infer

[shardformer] to fix whisper test failed due to significant accuracy …

20190b4

…differences. (hpcaitech#4710) * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed

[doc] fix llama2 code link (hpcaitech#4726)

ce97790

* [doc] fix llama2 code link * [doc] fix llama2 code link * [doc] fix llama2 code link

[format] applied code formatting on changed files in pull request 4726 (

8c2dda7

hpcaitech#4727) Co-authored-by: github-actions <github-actions@github.com>

[doc] add shardformer support matrix/update tensor parallel documents (…

50e5602

…hpcaitech#4728) * add compatibility matrix for shardformer doc * update tp doc

Optimized some syntax errors in the documentation and code under appl…

e4fc57c

…ications/ (hpcaitech#4127) Co-authored-by: flybird11111 <1829166702@qq.com>

[legacy] remove deterministic data loader test

cd4e61d

[shardformer] update seq parallel document (hpcaitech#4730)

6a03c93

* update doc of seq parallel * fix typo

[example] add gpt2 HybridParallelPlugin example (hpcaitech#4653)

608cffa

* add gpt2 HybridParallelPlugin example * update readme and testci * update test ci * fix test_ci bug * update requirements * add requirements * update requirements * add requirement * rename file

Merge pull request hpcaitech#4738 from ppt0011/main

73eb3e8

[legacy] remove deterministic data loader test

[doc] polish shardformer doc (hpcaitech#4735)

451c346

* arrange position of chapters * fix typos in seq parallel doc

[shardformer] add custom policy in hybrid parallel plugin (hpcaitech#…

ac27979

…4718) * add custom policy * update assert

ver217 and others added 17 commits September 21, 2023 16:30

[lazy] support torch 2.0 (hpcaitech#4763)

3e05c07

* [lazy] support _like methods and clamp * [lazy] pass transformers models * [lazy] fix device move and requires grad * [lazy] fix requires grad and refactor api * [lazy] fix requires grad

[bug] Fix the version check bug in colossalai run when generating the…

1e0e080

… cmd. (hpcaitech#4713) * Fix the version check bug in colossalai run when generating the cmd. * polish code

[release] update version (hpcaitech#4775)

4146f1c

* [release] update version * [doc] revert versions

initial commit: add colossal llama 2 (hpcaitech#4784)

74aa7d9

[feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

ce77785

* Add ColossalEval * Delete evaluate in Chat --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Tong Li <tong.li352711588@gmail.com>

[doc] add llama2 domain-specific solution news (hpcaitech#4789)

d512a4d

* [doc] add llama2 domain-specific solution news

[fix] fix weekly runing example (hpcaitech#4787)

26cd6d8

* [fix] fix weekly runing example * [fix] fix weekly runing example

[doc] polish shardformer doc (hpcaitech#4779)

a2db755

* fix example format in docstring * polish shardformer doc

[checkpointio] support unsharded checkpointIO for hybrid parallel (hp…

64a08b2

…caitech#4774) * support unsharded saving/loading for model * support optimizer unsharded saving * update doc * support unsharded loading for optimizer * small fix

update readme

bd01467

[lazy] support from_pretrained (hpcaitech#4801)

4965c0d

* [lazy] patch from pretrained * [lazy] fix from pretrained and add tests * [devops] update ci

update

8cbce61

Merge pull request hpcaitech#4805 from TongLi3701/docs/fix

62b6af1

[doc] Update TODO in README of Colossal-LLaMA-2

[hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

b6cf0ac

change filename: pretraining.py -> trainin.py there is no file named pretraing.py. wrong writing

[misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

a227063

ver217 changed the base branch from feature/shardformer to backup/shardformer September 26, 2023 08:15

ppt0011 added 7 commits September 27, 2023 15:14

[shardformer] GPT-J policy dev 5 Sep

7e81ba2

[shardformer] implement shard policy for base gpt-j model

0204ef3

[shardformer] implement shard policy for base gpt-j model 06 Sep

472faa5

[shardformer] implement policy for all GPT-J models and test

a879ccb

[shardformer] test GPT-J sharding policy

d0e6939

[shardformer] finished testing pp gpt-j sharding

1f237cc

[shardformer] clean up for pr

c013a0d

ppt0011 force-pushed the feature/shardformer branch from 72764c1 to de15ed5 Compare September 27, 2023 09:52

[shardformer] support all GPT-J shard former techniques except lazy init

ec9cba2

ppt0011 force-pushed the feature/shardformer branch from de15ed5 to ec9cba2 Compare September 27, 2023 09:58

ppt0011 closed this Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[shardformer] support all GPT-J auto sharding except lazy init#4721

[shardformer] support all GPT-J auto sharding except lazy init#4721
ppt0011 wants to merge 72 commits intohpcaitech:backup/shardformerfrom
ppt0011:feature/shardformer

ppt0011 commented Sep 14, 2023

Uh oh!

ppt0011 commented Sep 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

ppt0011 commented Sep 14, 2023

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

ppt0011 commented Sep 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants