L by jamesthesnake · Pull Request #82 · jamesthesnake/ColossalAI

jamesthesnake · 2023-07-05T20:54:07Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

L

Ra

L

Ra

Co

* feat: remove on_learn_epoch fn as not used * revert: add _on_learn_epoch fn * feat: remove NaiveStrategy * test: update train_prompts tests * fix: remove prepare_llama_tokenizer_and_embedding * test: add lora arg * feat: remove roberta support in train_prompts due to runtime errs * feat: remove deberta & roberta in rm as not used * test: remove deberta and roberta tests * feat: remove deberta and roberta models as not used * fix: remove calls to roberta * fix: remove prepare_llama_tokenizer_and_embedding * chore: update transformers version * docs: update transformers version * fix: fix actor inference * fix: fix ci * feat: change llama pad token to unk * revert: revert ddp setup_distributed * fix: change llama pad token to unk * revert: undo unnecessary changes * fix: use pip to install transformers

* init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example

…caitech#3816) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example * add share weight and train example * add train * add docstring and readme * add docstring for other files * pre-commit

* [shardformer] refactored the user api * polish code

* update readme with modules content * remove img

…caitech#3856) * add dropout layer, add dropout test * modify seed manager as context manager * add a copy of col_nn.layer * add dist_crossentropy loss; separate module test * polish the code * fix dist crossentropy loss

…pcaitech#3883) * add gpt2 policy and modify shard and slicer to support * remove unused code * polish code

* add bert align test, fix dist loss bug * forward and backward align * add ignore index * add shardformer CI * add gather_output optional for user in shardconfig * update readme with optional gather_ouput * add dist crossentropy loss test, remove unused files * remove unused file * remove unused file * rename the file * polish code

* fix bug in slicer, add slicer unit test * add dropout test * use pid as dropout seed * updata dropout test with local pattern * ad todo

hpcaitech#3949) * add dist dropout in model * update docstring and bert policy with dropout * refactor basepolicy and sharded, update bert * update format * update gpt2 policy * update bert policy * remove unused code * update readme for new policy usage

adjust layer attr

test t5

* add dist dropout in model * update docstring and bert policy with dropout * refactor basepolicy and sharded, update bert * update format * update gpt2 policy * update bert policy * remove unused code * update readme for new policy usage * add downstream model of bert * remove unused code

* fix an error in readme * simplify code

* fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review

* [shardformer] integrated linear 1D with dtensor * polish code

* [shardformer] adapted T5 and LLaMa test to use kit * polish code

)

* support kit use for bert test * support kit test for gpt2

* [shardformer] support module saving and loading * polish code

* add linearconv1d test * add linearconv1d test

* add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm

* [test] fixed tests failed due to dtensor change * polish code

* [shardformer] shardformer support opt models * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix

* first v of vit shardformer * keep vit * update * vit shard add vitattention vitlayer * update num head shard para * finish test for vit * add new_model_class & postprocess * add vit readme * delete old files & fix the conflict * fix sth

…itech#4126) * [shardformer] add benchmark of shardformer * [shardformer] add benchmark of shardformer

* [shardformer] refactored some doc and api * polish code

* [shardformer] made tensor parallelism configurable * polish code

…ch#4149)

hpcaitech#4157) Co-authored-by: github-actions <github-actions@github.com>

…ch#4162)

ll

Co

Merge pull request #82 from jamesthesnake/l

jamesthesnake and others added 30 commits June 13, 2023 07:39

Merge pull request #63 from jamesthesnake/l

8990889

L

Merge pull request #66 from jamesthesnake/ra

57b2c31

Ra

Merge pull request #70 from jamesthesnake/ra

688b7ea

Ra

Merge pull request #73 from jamesthesnake/ra

c2402f7

Ra

Merge pull request #74 from jamesthesnake/l

98cf97e

L

Merge pull request #77 from jamesthesnake/ra

2537c72

Ra

Merge pull request #78 from jamesthesnake/co

c961477

Co

fix CheckpointIndexFile is not defined (hpcaitech#4109)

7e46bc8

fix Tensor is not defined (hpcaitech#4129)

8abc877

[hotfix] fix import bug in checkpoint_io (hpcaitech#4142)

1350ece

[shardformer] updated readme (hpcaitech#3827)

235792f

[shardformer] refactored the user api (hpcaitech#3828)

4972e1f

* [shardformer] refactored the user api * polish code

[shardformer] update readme with modules implement doc (hpcaitech#3834)

c594dc2

* update readme with modules content * remove img

update README (hpcaitech#3909)

70173e3

[shardformer] add gpt2 policy and modify shard and slicer to support (h…

79f8d5d

…pcaitech#3883) * add gpt2 policy and modify shard and slicer to support * remove unused code * polish code

[shardformer] Unit test (hpcaitech#3928)

a731304

* fix bug in slicer, add slicer unit test * add dropout test * use pid as dropout seed * updata dropout test with local pattern * ad todo

[shardformer] support llama model using shardformer (hpcaitech#3969)

6b30dfb

adjust layer attr

[shardformer] shardformer support t5 model (hpcaitech#3994)

c1c672d

test t5

[shardformer] fix an error in readme (hpcaitech#3988)

a2f9af8

* fix an error in readme * simplify code

[device] support init device mesh from process group (hpcaitech#3990)

6119712

[shardformer] Refactor shardformer api (hpcaitech#4001)

d3bc530

* fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review

[shardformer] integrated linear 1D with dtensor (hpcaitech#3996)

015af59

* [shardformer] integrated linear 1D with dtensor * polish code

integrate with dist layer (hpcaitech#4011)

dfca967

FrankLeeeee and others added 28 commits July 4, 2023 16:05

[shardformer] adapted T5 and LLaMa test to use kit (hpcaitech#4049)

58df720

* [shardformer] adapted T5 and LLaMa test to use kit * polish code

[shardformer] refactored the shardformer layer structure (hpcaitech#4053

f22ddac

)

support kit use for bert/gpt test (hpcaitech#4055)

7740c55

* support kit use for bert test * support kit test for gpt2

[shardformer] support module saving and loading (hpcaitech#4062)

8eb09a4

* [shardformer] support module saving and loading * polish code

[shardformer] add linearconv1d test (hpcaitech#4067)

0803a61

* add linearconv1d test * add linearconv1d test

[shardformer] supported fused qkv checkpoint (hpcaitech#4073)

70c58cf

[shardformer] Add layernorm (hpcaitech#4072)

92f6791

* add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm

[test] fixed tests failed due to dtensor change (hpcaitech#4082)

c4b1b65

* [test] fixed tests failed due to dtensor change * polish code

[shardformer] refactored layernorm (hpcaitech#4086)

d33a44e

[shardformer] shardformer support opt models (hpcaitech#4091)

ac80937

* [shardformer] shardformer support opt models * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix

[shardformer] supported bloom model (hpcaitech#4098)

b1c2901

[shardformer] supported fused normalization (hpcaitech#4112)

f3b6aaa

[shardformer] integrate with data parallelism (hpcaitech#4103)

6a88bae

[shardformer] import huggingface implicitly (hpcaitech#4101)

44a190e

[shardformer] added embedding gradient check (hpcaitech#4124)

ae035d3

[shardformer] write an shardformer example with bert finetuning (hpca…

7f9b303

…itech#4126) * [shardformer] add benchmark of shardformer * [shardformer] add benchmark of shardformer

[shardformer] refactored some doc and api (hpcaitech#4137)

74257cb

* [shardformer] refactored some doc and api * polish code

[shardformer] made tensor parallelism configurable (hpcaitech#4144)

1fb0d95

* [shardformer] made tensor parallelism configurable * polish code

[shardformer] added development protocol for standardization (hpcaite…

89f45ed

…ch#4149)

[chat] removed cache file (hpcaitech#4155)

f447ca1

[format] applied code formatting on changed files in pull request 4152 (

c77b3b1

hpcaitech#4157) Co-authored-by: github-actions <github-actions@github.com>

fix some typo colossalai/shardformer (hpcaitech#4160)

2ac2404

[cli] hotfix launch command for multi-nodes (hpcaitech#4165)

1908caa

[workflow] show test duration (hpcaitech#4159)

cc3cbe9

[dtensor] fixed readme file name and removed deprecated file (hpcaite…

190a6ea

…ch#4162)

Merge pull request #80 from hpcaitech/main

6748e3d

ll

Merge pull request #81 from jamesthesnake/co

282e2b6

Co

jamesthesnake merged commit 7e9f4f7 into ra Jul 5, 2023

jamesthesnake added a commit that referenced this pull request Jul 5, 2023

Merge pull request #83 from jamesthesnake/ra

b19d1fc

Merge pull request #82 from jamesthesnake/l

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L#82

L#82
jamesthesnake merged 67 commits intorafrom
l

jamesthesnake commented Jul 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

jamesthesnake commented Jul 5, 2023

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants