Skip to content

L#26

Merged
jamesthesnake merged 36 commits intomainfrom
l
Apr 20, 2023
Merged

L#26
jamesthesnake merged 36 commits intomainfrom
l

Conversation

@jamesthesnake
Copy link
Copy Markdown
Owner

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

chengeharrison and others added 30 commits April 12, 2023 15:47
…ming format of hf checkpoint files (hpcaitech#3479)

* [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format

* [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format

* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename

* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename

* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename

* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename

---------

Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local>
Co-authored-by: luchen <luchen@luchendeMBP.lan>
* [gemini] fix nvme optimizer init

* [gemini] gemini supports lazy init

* [gemini] add init example

* [gemini] add fool model

* [zero] update gemini ddp

* [zero] update init example

* add chunk method

* add chunk method

* [lazyinit] fix lazy tensor tolist

* [gemini] fix buffer materialization

* [misc] remove useless file

* [booster] update gemini plugin

* [test] update gemini plugin test

* [test] fix gemini plugin test

* [gemini] fix import

* [gemini] fix import

* [lazyinit] use new metatensor

* [lazyinit] use new metatensor

* [lazyinit] fix __set__ method
Fixing document link errors using absolute paths
Format Optimization ,Add [] outside of DeepSpeed
* [chat] clean up duplicate tutorial

* [chat] clean up duplicate tutorial

* [chat] clean up duplicate tutorial

* [chat] clean up duplicate tutorial
* [feat][chatgpt]train prompts on ray example

* [fix]simplify code

* [fix]remove depreciated parameter

* [fix]add dependencies

* [fix]method calling

* [fix]experience maker

* [fix]missing loss function

* [fix]init optimizer

* [feat]add usage comment

* [fix]rename files

* [fix]add readme

* [fix]file path

* [fix]move directory

---------

Co-authored-by: jiangwen <zxl265370@antgroup.com>
Display format optimization, fix bug#3562
Specific changes
1. "This is called a column-parallel fashion" Translate to Chinese
2. use the ```math code block syntax to display a math expression as a block, No modification of formula content

Please check that the math formula is displayed correctly
If OK, I will change the format of the English version of the formula in parallel
* [misc] add print verbose

* [gemini] add print verbose

* [zero] add print verbose for low level

* [misc] add print verbose for op builder
Display format optimization , same as fix#3562
Simultaneous modification of en version
* run the base

* working on dist ppo

* sync

* detached trainer

* update detached trainer. no maker update function

* facing init problem

* 1 maker 1 trainer detached run. but no model update

* facing cuda problem

* fix save functions

* verified maker update

* nothing

* add ignore

* analyize loss issue

* remove some debug codes

* facing 2m1t stuck issue

* 2m1t verified

* do not use torchrun

* working on 2m2t

* working on 2m2t

* initialize strategy in ray actor env

* facing actor's init order issue

* facing ddp model update issue (need unwarp ddp)

* unwrap ddp actor

* checking 1m2t stuck problem

* nothing

* set timeout for trainer choosing. It solves the stuck problem!

* delete some debug output

* rename to sync with upstream

* rename to sync with upstream

* coati rename

* nothing

* I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations

* experience_maker_holder performs target-revolving _send_experience() instead of length comparison.

* move code to ray subfolder

* working on pipeline inference

* apply comments

---------

Co-authored-by: csric <richcsr256@gmail.com>
Optimization Code
I think there were two extra $ entered here, which have been deleted
* [gemini] support state dict shard

* [gemini] add test state dict shard

* [gemini] polish docstr

* [gemini] fix merge

* [gemini] polish code
Adjusted the style of Community Examples to be consistent with other titles
update

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update test_ci.sh

Update test_ci.sh

update

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

update ci

Update test_ci.sh

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update test_ci.sh

Update test_ci.sh

Update run_chatgpt_examples.yml

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

update test ci

RoBERTa for RLHF Stage 2 & 3 (still in testing)

Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d8.

Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

Update test_ci.sh

Revert "Update test_ci.sh"

This reverts commit 9c7352b.

Add RoBERTa for RLHF Stage 2 & 3 (test)

RoBERTa for RLHF Stage 2 & 3 (still in testing)

Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d8.

Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

Update test_ci.sh

Revert "Update test_ci.sh"

This reverts commit 9c7352b.

update roberta with coati

chat ci update

Revert "chat ci update"

This reverts commit 17ae7ae.

[test]chat_update_ci

Update test_ci.sh

Update test_ci.sh

test

Update gpt_critic.py

Update gpt_critic.py

Update run_chatgpt_unit_tests.yml

update test ci

update

update

update

update

Update test_ci.sh

update

Update test_ci.sh

Update test_ci.sh

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml
* [meta] fix torch 1.13.1

* [meta] fix torch 2.0.0

* [meta] fix torch 1.13.0

* [meta] polish code
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
Optimization Code
The source code has not been modified, only a few spelling errors in the comments have been changed
Co-authored-by: github-actions <github-actions@github.com>
* [gemini] save state dict support fp16

* [gemini] save state dict shard support fp16

* [gemini] fix state dict

* [gemini] fix state dict
Optimization Code
change "requries" to "requires"
digger-yu and others added 6 commits April 19, 2023 17:28
Optimization Code
change "vairable" to "variable"
Fixed several word spelling errors
change "compatiblity" to "compatibility" etc.
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
@jamesthesnake jamesthesnake merged commit 3c11a89 into main Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.