Skip to content

add tensorboard close logic#25

Closed
CZYCW wants to merge 426 commits intover217:mainfrom
CZYCW:feature/chat-tensorboard
Closed

add tensorboard close logic#25
CZYCW wants to merge 426 commits intover217:mainfrom
CZYCW:feature/chat-tensorboard

Conversation

@CZYCW
Copy link
Copy Markdown

@CZYCW CZYCW commented Jul 26, 2023

No description provided.

tanitna and others added 30 commits April 28, 2023 15:42
fix spelling error with evaluate.py
fix spelling error with generate_gpt35_answers.py
* [chat] add opt attn kernel

* [chat] disable xformer during fwd
* Update README.md

change "huggingaface" to "huggingface"

* Update README.md

change "Colossa-AI" to "Colossal-AI"
* Add RoBERTa for RLHF Stage 2 & 3 (test)

RoBERTa for RLHF Stage 2 & 3 (still in testing)

Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d8.

Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

Update test_ci.sh

Revert "Update test_ci.sh"

This reverts commit 9c7352b.

Add RoBERTa for RLHF Stage 2 & 3 (test)

RoBERTa for RLHF Stage 2 & 3 (still in testing)

Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d8.

Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

Update test_ci.sh

Revert "Update test_ci.sh"

This reverts commit 9c7352b.

update roberta with coati

chat ci update

Revert "chat ci update"

This reverts commit 17ae7ae.

* Update README.md

Update README.md

* update readme

* Update test_ci.sh

* update readme and add a script

update readme and add a script

modify readme

Update README.md
* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin add shard checkpoint save/load

* gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

* [API Refactoring]gemini plugin support shard checkpoint

---------

Co-authored-by: luchen <luchen@luchendeMBP.lan>
Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local>
fix spelling error with applications/Chat/evaluate/
fix spelling error in line23
change "cudnn_determinstic"=True to "cudnn_deterministic=True"
* [booster] add dp plugin base

* [booster] inherit dp plugin base

* [booster] refactor unit tests
)

* fix spelling error with examples/comminity/

* fix spelling error with example/
* fix gemini strategy bug

* add comment

* add comment

* better solution
* [example] add train vit with booster example

* [example] update readme

* [example] add train resnet with booster example

* [example] enable ci

* [example] enable ci

* [example] add requirements

* [hotfix] fix analyzer init

* [example] update requirements
* [booster] add prepare dataloader method for plug

* [booster] update examples and docstr
* [booster] fix no_sync method

* [booster] add test for ddp no_sync

* [booster] fix merge

* [booster] update unit test

* [booster] update unit test

* [booster] update unit test
…ech#3715)

* [booster] update tests for booster

* [booster] update tests for booster

* [booster] update tests for booster

* [booster] update tests for booster

* [booster] update tests for booster

* [booster] update booster tutorials#3717, fix recursive check
Co-authored-by: jiangwen <zxl265370@antgroup.com>
* fix spelling error with examples/comminity/

* fix spelling error with tests/

* fix some spelling error with tests/ colossalai/ etc.
* fix spelling error with examples/comminity/

* fix spelling error with tests/
* fix spelling error with examples/comminity/

* fix spelling error with tests/

* fix some spelling error with tests/ colossalai/ etc.

* fix spelling error with tests/ etc. date:2023.5.10
Co-authored-by: 纪少敏 <jishaomin@jishaomindeMBP.lan>
* [test] fix flop tensor test

* [test] fix autochunk test

* [test] fix lazyinit test

* [devops] update torch version of CI

* [devops] enable testmon

* [devops] fix ci

* [devops] fix ci

* [test] fix checkpoint io test

* [test] fix cluster test

* [test] fix timm test

* [devops] fix ci

* [devops] fix ci

* [devops] fix ci

* [devops] fix ci

* [devops] force sync to test ci

* [test] skip fsdp test
Co-authored-by: 纪少敏 <jishaomin@jishaomindeMBP.lan>
…3742)

* fix typo applications/ and colossalai/ date 5.11

* fix typo colossalai/
* [devops] make build on PR run automatically

* [devops] update build on pr condition
* [doc] add test info

* [devops] update doc check ci

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] remove debug info and update invalid doc

* [devops] add essential comments
klhhhhh and others added 28 commits July 4, 2023 16:05
* first v of vit shardformer

* keep vit

* update

* vit shard add vitattention vitlayer

* update num head shard para

* finish test for vit

* add new_model_class & postprocess

* add vit readme

* delete old files & fix the conflict

* fix sth
…itech#4126)

* [shardformer] add benchmark of shardformer

* [shardformer] add benchmark of shardformer
* [shardformer] refactored some doc and api

* polish code
* [shardformer] made tensor parallelism configurable

* polish code
hpcaitech#4157)

Co-authored-by: github-actions <github-actions@github.com>
…Plugin (hpcaitech#4141)

* [checkpointio] unsharded optimizer checkpoint for Gemini plugin

* [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather
* [docker] fixed ninja build command

* polish code
Co-authored-by: github-actions <github-actions@github.com>
…pcaitech#4241)

* added softmax kernel

* added qkv_kernel

* added ops

* adding tests

* upload tets

* fix tests

* debugging

* debugging tests

* debugging

* added

* fixed errors

* added softmax kernel

* clean codes

* added tests

* update tests

* update tests

* added attention

* add

* fixed pytest checking

* add cuda check

* fix cuda version

* fix typo
* [lazy] support init on cuda

* [test] update lazy init test

* [test] fix transformer version
…ech#4302)

* sharded optimizer checkpoint for gemini plugin

* modify test to reduce testing time

* update doc

* fix bug when keep_gatherd is true under GeminiPlugin
@CZYCW CZYCW closed this Jul 26, 2023
@CZYCW CZYCW deleted the feature/chat-tensorboard branch July 26, 2023 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.