Ra by jamesthesnake · Pull Request #2 · jamesthesnake/ColossalAI

jamesthesnake · 2023-03-23T23:00:47Z

No description provided.

* [workflow] added test-pypi check before release * polish code

…2598) * [workflow] added cuda extension build test before release * polish code

* [release] v0.2.1 * polish code

Co-authored-by: github-actions <github-actions@github.com>

* [workflow] fixed the test coverage report * polish code

* add alphafold benchmark * renae alphafold test * rename tests * rename diffuser * renme * rename * update transformer * update benchmark * update benchmark * update bench memory * update transformer benchmark * rename * support diffuser * support unet metainfo prop * fix bug and simplify code * update linear and support some op * optimize max region search, support conv * update unet test * support some op * support groupnorm and interpolate * update flow search * add fix dim in node flow * fix utils * rename * support diffusion * update diffuser * update chunk search * optimize imports * import * finish autochunk

)

* [autoparallel] matmul metainfo * [auto_parallel] remove unused print * [tests] skip test_matmul_handler when torch version is lower than 1.12.0

…ch#2615) * [autoparallel] refactor handlers which reshape input tensors * polish

* [doc] fix typo of BLOOM

* [tutorial] polish readme.md * [example] Update README.md

* [lazyinit] fix shared module * [tests] add lazy init test utils * [tests] add torchvision for lazy init * [lazyinit] fix pre op fn * [lazyinit] handle legacy constructor * [tests] refactor lazy init test models * [tests] refactor lazy init test utils * [lazyinit] fix ops don't support meta * [tests] lazy init test timm models * [lazyinit] fix set data * [lazyinit] handle apex layers * [tests] lazy init test transformers models * [tests] lazy init test torchaudio models * [lazyinit] fix import path * [tests] lazy init test torchrec models * [tests] update torch version in CI * [tests] revert torch version in CI * [tests] skip lazy init test

* [chatgpt] fix generation early stopping * [chatgpt] fix train prompts example

* add normalize function to value_head in bloom rm * add normalization to value_function in gpt_rm * add normalization to value_head of opt_rm * add Anthropic/hh-rlhf dataset * Update __init__.py * Add LogExpLoss in RM training * Update __init__.py * update rm trainer to use acc as target * update example/train_rm * Update train_rm.sh * code style * Update README.md * Update README.md * add rm test to ci * fix tokenier * fix typo * change batchsize to avoid oom in ci * Update test_ci.sh

* refactor: README-zh-Hans * refactor: REFERENCE * docs: update paths in README

* [test] fixed torchrec model test * polish code * polish code * polish code * polish code * polish code * polish code

…pcaitech#3170) * Update requirements.txt * Update environment.yaml * Update README.md * Update environment.yaml

* [test] fixed torchrec registration in model zoo * polish code * polish code * polish code

* add auto-offload feature * polish code * fix syn offload runtime pass bug * add offload example * fix offload testing bug * fix example testing bug

…3190) * Update requirements.txt * Update environment.yaml * Update README.md * Update environment.yaml * Update README.md * Update README.md * Delete requirements_colossalai.txt * Update requirements.txt * Update README.md

* [booster] added the plugin base and torch ddp plugin * polish code * polish code * polish code

* [chatgpt] add supervised fine-tune code * [chatgpt] delete unused code and modified comment code * [chatgpt] use pytorch distributed sampler instead --------- Co-authored-by: zhangpengpeng <zhangpengpeng@joyy.com>

…ech#3157) * pass gpt trace and meta_prop * pass t5 trace and meta_prop * [FX] refactor experimental tracer and adapt it with hf models * pass all mainstream model zoo * fix CI * fix CI * fix CI * fix CI * fix CI * fix CI * fix CI * fix CI * skip tests * fix CI * using packaging version * polish

* [booster] implemented the cluster module * polish code

Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>

* [lazyinit] lazy tensor add distribute * [lazyinit] refactor distribute * [lazyinit] add test dist lazy init * [lazyinit] add verbose info for dist lazy init * [lazyinit] fix rnn flatten weight op * [lazyinit] polish test * [lazyinit] polish test * [lazyinit] fix lazy tensor data setter * [lazyinit] polish test * [lazyinit] fix clean * [lazyinit] make materialize inplace * [lazyinit] refactor materialize * [lazyinit] refactor test distribute * [lazyinit] fix requires_grad * [lazyinit] fix tolist after materialization * [lazyinit] refactor distribute module * [lazyinit] polish docstr * [lazyinit] polish lazy init context * [lazyinit] temporarily skip test * [lazyinit] polish test * [lazyinit] add docstr

* [api] implemented the checkpoint io module * polish code * polish code

a

FrankLeeeee and others added 30 commits February 6, 2023 15:42

[workflow] added test-pypi check before release (hpcaitech#2591)

d6cc8f3

* [workflow] added test-pypi check before release * polish code

[workflow] hooked docker release with lark (hpcaitech#2594)

fd90245

[workflow] hooked pypi release with lark (hpcaitech#2596)

0c03802

[workflow] added cuda extension build test before release (hpcaitech#…

4d58289

…2598) * [workflow] added cuda extension build test before release * polish code

[doc] updated readme for CI/CD (hpcaitech#2600)

719c4d5

[release] v0.2.1 (hpcaitech#2602)

f7458d3

* [release] v0.2.1 * polish code

[workflow] fixed broken rellease workflows (hpcaitech#2604)

f566b0c

Automated submodule synchronization (hpcaitech#2607)

ae86be1

Co-authored-by: github-actions <github-actions@github.com>

[workflow] fixed test coverage report (hpcaitech#2611)

b3973b9

[workflow] fixed the test coverage report (hpcaitech#2614)

aa7e9e4

* [workflow] fixed the test coverage report * polish code

[test] fixed the triton version for testing (hpcaitech#2608)

8518263

[build] fixed the doc build process (hpcaitech#2618)

93fdd35

[tutorial] add video link (hpcaitech#2619)

0556f5d

[doc] fixed broken badge (hpcaitech#2623)

291b051

[tutorial] added energonai to opt inference requirements (hpcaitech#2625

4ae02c4

)

[autoparallel] Patch meta information of torch.matmul (hpcaitech#2584)

90a9fdd

* [autoparallel] matmul metainfo * [auto_parallel] remove unused print * [tests] skip test_matmul_handler when torch version is lower than 1.12.0

[doc] updated the sphinx theme (hpcaitech#2635)

d348039

fix/transformer-verison (hpcaitech#2581)

292c81e

[doc] removed pre-built wheel installation from readme (hpcaitech#2637)

c375563

[autoparallel] adapt autoparallel tests with latest api (hpcaitech#2626)

cb3d1be

add overlap option (hpcaitech#2613)

28398f1

[autoparallel] refactor handlers which reshape input tensors (hpcaite…

37df666

…ch#2615) * [autoparallel] refactor handlers which reshape input tensors * polish

[doc] fix typo of BLOOM (hpcaitech#2643)

a020eec

* [doc] fix typo of BLOOM

[doc] migrate the markdown files (hpcaitech#2652)

85b2303

[doc] added docusaurus-based version control (hpcaitech#2656)

a4ae43f

[doc] fixed compatiblity with docusaurus (hpcaitech#2657)

cd4f02b

[example] Polish README.md (hpcaitech#2658)

a255a38

* [tutorial] polish readme.md * [example] Update README.md

[workflow] fixed gpu memory check condition (hpcaitech#2659)

94f87f9

[release] v0.2.2 (hpcaitech#2661)

b673e5f

ver217 and others added 29 commits March 17, 2023 13:49

[chatgpt] fix ppo training hanging problem with gemini (hpcaitech#3162)

c474fda

* [chatgpt] fix generation early stopping * [chatgpt] fix train prompts example

[chatgpt] fix trainer generate kwargs (hpcaitech#3166)

1e58d31

[refactor] update docs (hpcaitech#3174)

20d1c99

* refactor: README-zh-Hans * refactor: REFERENCE * docs: update paths in README

[test] fixed torchrec model test (hpcaitech#3167)

1ad3a63

* [test] fixed torchrec model test * polish code * polish code * polish code * polish code * polish code * polish code

[booster] added the accelerator implementation (hpcaitech#3159)

a9b8402

[examples] Solving the diffusion issue of incompatibility issue#3169 (h…

4e921cf

…pcaitech#3170) * Update requirements.txt * Update environment.yaml * Update README.md * Update environment.yaml

[test] fixed torchrec registration in model zoo (hpcaitech#3177)

085e7f4

* [test] fixed torchrec registration in model zoo * polish code * polish code * polish code

updated flash attention usage

7bc0afc

Fix docstr for zero statedict (hpcaitech#3185)

9d644ff

[zero] Refactor ZeroContextConfig class using dataclass (hpcaitech#3186)

80aed29

[hotfix] layout converting issue (hpcaitech#3188)

258b433

[auto-parallel] add auto-offload feature (hpcaitech#3154)

18dbe76

* add auto-offload feature * polish code * fix syn offload runtime pass bug * add offload example * fix offload testing bug * fix example testing bug

[booster] added the plugin base and torch ddp plugin (hpcaitech#3180)

e7f3bed

* [booster] added the plugin base and torch ddp plugin * polish code * polish code * polish code

[chatgpt] add supervised learning fine-tune code (hpcaitech#3183)

b429529

* [chatgpt] add supervised fine-tune code * [chatgpt] delete unused code and modified comment code * [chatgpt] use pytorch distributed sampler instead --------- Co-authored-by: zhangpengpeng <zhangpengpeng@joyy.com>

[Analyzer] fix analyzer tests (hpcaitech#3197)

019a847

[booster] implemented the cluster module (hpcaitech#3191)

e3ad88f

* [booster] implemented the cluster module * polish code

[chatgpt]support llama (hpcaitech#3070)

1e1b9d2

[chatgpt]add reward model code for deberta (hpcaitech#3199)

9998d5e

Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>

[auto] fix requirements typo for issue hpcaitech#3125 (hpcaitech#3209)

1893479

[api] implemented the checkpoint io module (hpcaitech#3205)

cd142fb

* [api] implemented the checkpoint io module * polish code * polish code

[chatgpt] support instuct training (hpcaitech#3216)

4fd4bd9

[chatgpt] unnify datasets (hpcaitech#3218)

fa97a9c

fix torch version (hpcaitech#3225)

bbac676

Merge pull request #1 from hpcaitech/main

7fb95c5

a

jamesthesnake merged commit 0fc19e1 into co Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ra#2

Ra#2
jamesthesnake merged 517 commits intocofrom
ra

jamesthesnake commented Mar 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

jamesthesnake commented Mar 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants