[chatgpt] Detached PPO Training by CsRic · Pull Request #3195 · hpcaitech/ColossalAI

CsRic · 2023-03-21T10:31:08Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

'Detached' PPO training means that the experience makers and trainers are splitted into different nodes for asynchronous training. Models are not shared.

Propose several classes for 'detached' manner: ExperienceMakerHolder, DetachedReplayBuffer, DetachedPPOTrainer
Implement Ray for Detached workflow structure
Examples: 1m1t. 1m2t, 2m1t, 2m2t.
Supported Strategies: Naive, DDP.
Won't affect present code.

Known issues:
~~1. Cannot detect cuda device on each worker.~~ fixed 20230324
2. Cannot run with Colossal strategy.
~~3. correctness of 1m1t.py example.~~ fixed 20230324

TODO:

Implement parameter update from trainer to experience maker
fix issues
Add TP strategy for experience maker and trainer.
Support multiple nodes.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

…to detached_ppo

…Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations

…instead of length comparison.

binmakeswell · 2023-04-07T02:52:40Z

Hi @CsRic Thanks for your contribution, but there is a conflict in this PR. Could you please solve them first? Thanks.

ver217 · 2023-04-13T09:13:34Z

Can you move all files to coati/ray?

CsRic · 2023-04-13T10:29:12Z

Can you move all files to coati/ray?

Done

csric and others added 9 commits March 16, 2023 14:52

run the base

994b40c

working on dist ppo

0390f6e

sync

c1df61b

Merge remote-tracking branch 'upstream/main' into chatgpt_dist_ppo

32837c3

detached trainer

518f837

update detached trainer. no maker update function

b707ba2

facing init problem

1311924

1 maker 1 trainer detached run. but no model update

29976fa

Merge branch 'hpcaitech:main' into detached_ppo

ea4761a

ht-zhou reviewed Mar 22, 2023

View reviewed changes

Comment thread applications/ChatGPT/examples/train_prompts.sh Outdated

ht-zhou reviewed Mar 22, 2023

View reviewed changes

Comment thread applications/ChatGPT/chatgpt/experience_maker/detached.py Outdated

ht-zhou reviewed Mar 22, 2023

View reviewed changes

Comment thread applications/ChatGPT/chatgpt/replay_buffer/detached.py Outdated

csric added 18 commits March 22, 2023 14:15

facing cuda problem

523e209

fix save functions

45361c2

Merge branch 'detached_ppo' of https://github.com/CsRic/ColossalAI in…

886cc98

…to detached_ppo

verified maker update

42aa4c7

nothing

26d82b5

add ignore

517ff22

Merge remote-tracking branch 'upstream/main' into detached_ppo

dc91d58

analyize loss issue

b91348d

fix detached ppo loss issue

b882fdd

remove some debug codes

ebd2be9

facing 2m1t stuck issue

650ec5b

Merge remote-tracking branch 'upstream/main' into detached_ppo

b40974b

2m1t verified

f791fb7

do not use torchrun

f468724

working on 2m2t

12b94f7

working on 2m2t

05df7d7

initialize strategy in ray actor env

0773697

facing actor's init order issue

9451a54

csric added 6 commits March 29, 2023 17:49

set timeout for trainer choosing. It solves the stuck problem!

459639c

delete some debug output

65363e1

rename to sync with upstream

2f8036b

rename to sync with upstream

7e5c8f2

merge upstream

c36d58a

coati rename

c0649c3

CsRic force-pushed the detached_ppo branch from 7d6b3a0 to c0649c3 Compare March 30, 2023 07:12

CsRic and others added 6 commits March 31, 2023 11:23

Merge branch 'hpcaitech:main' into detached_ppo

db29760

nothing

3c6f68c

merge

334956a

I am going to detach the replaybuffer from trainer and make it a Ray …

35e4602

…Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations

experience_maker_holder performs target-revolving _send_experience() …

04069cd

…instead of length comparison.

Merge 'upstream/main'

117f08c

Merge remote-tracking branch 'upstream/main' into detached_ppo

b0a002e

csric added 2 commits April 13, 2023 17:25

Merge remote-tracking branch 'upstream/main' into detached_ppo

9051002

move code to ray subfolder

3a4d0e7