add tensorboard close logic#25
Closed
CZYCW wants to merge 426 commits intover217:mainfrom
Closed
Conversation
fix spelling error with evaluate.py
fix spelling error with generate_gpt35_answers.py
* [chat] add opt attn kernel * [chat] disable xformer during fwd
* Update README.md change "huggingaface" to "huggingface" * Update README.md change "Colossa-AI" to "Colossal-AI"
* Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit 06741d8. Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci Update test_ci.sh Revert "Update test_ci.sh" This reverts commit 9c7352b. Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit 06741d8. Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci Update test_ci.sh Revert "Update test_ci.sh" This reverts commit 9c7352b. update roberta with coati chat ci update Revert "chat ci update" This reverts commit 17ae7ae. * Update README.md Update README.md * update readme * Update test_ci.sh * update readme and add a script update readme and add a script modify readme Update README.md
* gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint --------- Co-authored-by: luchen <luchen@luchendeMBP.lan> Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local>
fix spelling error with applications/Chat/evaluate/
fix spelling error in line23 change "cudnn_determinstic"=True to "cudnn_deterministic=True"
* [booster] add dp plugin base * [booster] inherit dp plugin base * [booster] refactor unit tests
* fix gemini strategy bug * add comment * add comment * better solution
* [example] add train vit with booster example * [example] update readme * [example] add train resnet with booster example * [example] enable ci * [example] enable ci * [example] add requirements * [hotfix] fix analyzer init * [example] update requirements
* [booster] add prepare dataloader method for plug * [booster] update examples and docstr
* [booster] fix no_sync method * [booster] add test for ddp no_sync * [booster] fix merge * [booster] update unit test * [booster] update unit test * [booster] update unit test
…ech#3715) * [booster] update tests for booster * [booster] update tests for booster * [booster] update tests for booster * [booster] update tests for booster * [booster] update tests for booster * [booster] update booster tutorials#3717, fix recursive check
Co-authored-by: jiangwen <zxl265370@antgroup.com>
* fix spelling error with examples/comminity/ * fix spelling error with tests/ * fix some spelling error with tests/ colossalai/ etc.
* fix spelling error with examples/comminity/ * fix spelling error with tests/
* fix spelling error with examples/comminity/ * fix spelling error with tests/ * fix some spelling error with tests/ colossalai/ etc. * fix spelling error with tests/ etc. date:2023.5.10
Co-authored-by: 纪少敏 <jishaomin@jishaomindeMBP.lan>
* [test] fix flop tensor test * [test] fix autochunk test * [test] fix lazyinit test * [devops] update torch version of CI * [devops] enable testmon * [devops] fix ci * [devops] fix ci * [test] fix checkpoint io test * [test] fix cluster test * [test] fix timm test * [devops] fix ci * [devops] fix ci * [devops] fix ci * [devops] fix ci * [devops] force sync to test ci * [test] skip fsdp test
Co-authored-by: 纪少敏 <jishaomin@jishaomindeMBP.lan>
…3742) * fix typo applications/ and colossalai/ date 5.11 * fix typo colossalai/
* [devops] make build on PR run automatically * [devops] update build on pr condition
* [doc] add test info * [devops] update doc check ci * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] remove debug info and update invalid doc * [devops] add essential comments
* first v of vit shardformer * keep vit * update * vit shard add vitattention vitlayer * update num head shard para * finish test for vit * add new_model_class & postprocess * add vit readme * delete old files & fix the conflict * fix sth
…itech#4126) * [shardformer] add benchmark of shardformer * [shardformer] add benchmark of shardformer
* [shardformer] refactored some doc and api * polish code
* [shardformer] made tensor parallelism configurable * polish code
hpcaitech#4157) Co-authored-by: github-actions <github-actions@github.com>
…Plugin (hpcaitech#4141) * [checkpointio] unsharded optimizer checkpoint for Gemini plugin * [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather
* [docker] fixed ninja build command * polish code
Co-authored-by: github-actions <github-actions@github.com>
…pcaitech#4241) * added softmax kernel * added qkv_kernel * added ops * adding tests * upload tets * fix tests * debugging * debugging tests * debugging * added * fixed errors * added softmax kernel * clean codes * added tests * update tests * update tests * added attention * add * fixed pytest checking * add cuda check * fix cuda version * fix typo
* [lazy] support init on cuda * [test] update lazy init test * [test] fix transformer version
…ech#4302) * sharded optimizer checkpoint for gemini plugin * modify test to reduce testing time * update doc * fix bug when keep_gatherd is true under GeminiPlugin
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.