[Feature] Support Distributed LogProb for GRPO Training by duanjunwen · Pull Request #6247 · hpcaitech/ColossalAI

duanjunwen · 2025-03-14T10:42:21Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

TongLi3701

Thanks Junwen, I left some comments.

TongLi3701 · 2025-03-17T09:58:22Z

Please also compare the peak memory when using these two different methods.

duanjunwen · 2025-03-18T01:27:33Z

Please also compare the peak memory when using these two different methods.

We compared the case where the parallel output is True or False under strategy tp2zero1.
When parallel output is False, the peak mem is 93000+ MB (may suffer OOM when other users grab resources).
When parallel output is True, the peak mem is around 82000 MB.

TongLi3701

Thanks Junwen, I left some comments. Please address the comments and click on resolve conversation then request review again.

Thanks.

…rpo-latest

duanjunwen · 2025-03-18T03:56:17Z

Resolve Conflict.

* add reward related function * add simple grpo * update grpo * polish * modify data loader * grpo consumer * update loss * update reward fn * update example * update loader * add algo selection * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add save * update select algo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update grpo * update reward fn * update reward * fix reward score * add response length * detach * fix tp bug * fix consumer * convert to 8 generation * print results * setup update * fix transformers backend * [Feature] Support Distributed LogProb for GRPO Training (#6247) * [fix] fix qwen VocabParallelLMHead1D and gather output * fix tp bug * fix consumer * [feat] Support Distributed LogProb for GRPO Training * [fix] fix loss func * [fix] fix log prob plugin * [fix] fix qwen modeling param * [fix] rm comments * [fix] rm hard-code;fix non-dist version * [fix] fix test file param name and benchmark tp gather output=True/False * [fix] rm non-dist version in dist log prob * [fix] fix comments * [fix] fix dis log prob plugin * [fix] fix test case * [fix] fix qwen VocabParallelLMHead1D and gather output * [fix] fix DistLogProb comments * [fix] restore tp size * [fix] fix comments * [fix] fix comment; fix LogSoftmax usage --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com> * fix vllm * fix logprob, add filtering, temperature annealing, lr descent * simplify vllm preprocessing input ids * update logging * [feat] add microbatch forwarding (#6251) * add microbatch forwarding * fix forward microbatch * fix producer OOM * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change project name * fix temperature annealing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address conversation --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Distributed RLHF] Integration of PP (#6257) * update help information * update style * fix * minor fix * support PP training * add pp support * remove unused code * address conversation --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com> * [hot-fix] Fix memory leakage bug, support TP+PP (#6258) * update help information * update style * fix * minor fix * support PP training * add pp support * remove unused code * address conversation * fix memory leakage support tp+pp * move empty cache * move empty cache --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com> --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: YeAnbang <anbangy2@outlook.com> Co-authored-by: duanjunwen <935724073@qq.com> Co-authored-by: YeAnbang <44796419+YeAnbang@users.noreply.github.com>

* [fix] fix qwen VocabParallelLMHead1D and gather output * fix tp bug * fix consumer * [feat] Support Distributed LogProb for GRPO Training * [fix] fix loss func * [fix] fix log prob plugin * [fix] fix qwen modeling param * [fix] rm comments * [fix] rm hard-code;fix non-dist version * [fix] fix test file param name and benchmark tp gather output=True/False * [fix] rm non-dist version in dist log prob * [fix] fix comments * [fix] fix dis log prob plugin * [fix] fix test case * [fix] fix qwen VocabParallelLMHead1D and gather output * [fix] fix DistLogProb comments * [fix] restore tp size * [fix] fix comments * [fix] fix comment; fix LogSoftmax usage --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com>

duanjunwen and others added 4 commits March 13, 2025 13:24

[fix] fix qwen VocabParallelLMHead1D and gather output

03ce3c5

fix tp bug

b835d1b

fix consumer

137ec17

[feat] Support Distributed LogProb for GRPO Training

ce8a8b3

duanjunwen requested a review from a team as a code owner March 14, 2025 10:42

duanjunwen requested a review from TongLi3701 March 17, 2025 01:25

duanjunwen added 5 commits March 17, 2025 10:57

Merge branch 'hpcaitech:grpo-latest' into grpo-latest

7b3c310

[fix] fix loss func

a810b20

[fix] fix log prob plugin

c247bd8

[fix] fix qwen modeling param

b78ab3a

[fix] rm comments

dddd062

TongLi3701 suggested changes Mar 17, 2025

View reviewed changes

duanjunwen added 2 commits March 17, 2025 18:09

[fix] rm hard-code;fix non-dist version

74de49d

[fix] fix test file param name and benchmark tp gather output=True/False

188d69d

duanjunwen requested a review from TongLi3701 March 17, 2025 10:39

duanjunwen added 2 commits March 18, 2025 09:34

[fix] rm non-dist version in dist log prob

01bcaca

[fix] fix comments

0277592

TongLi3701 reviewed Mar 18, 2025

View reviewed changes

Comment thread applications/ColossalChat/coati/distributed/utils.py Outdated

duanjunwen added 5 commits March 18, 2025 11:34

[fix] fix dis log prob plugin

3a8a387

[fix] fix test case

d29f39d

[fix] fix qwen VocabParallelLMHead1D and gather output

dcf3f9b

Merge branch 'grpo-latest' of github.com:duanjunwen/ColossalAI into g…

d90bf57

…rpo-latest

Merge branch 'grpo-latest' into grpo-dist-loss

8615b24

duanjunwen requested a review from TongLi3701 March 18, 2025 03:55

TongLi3701 reviewed Mar 18, 2025

View reviewed changes

Comment thread applications/ColossalChat/coati/distributed/consumer.py Outdated

Comment thread colossalai/shardformer/layer/loss.py Outdated

duanjunwen added 2 commits March 18, 2025 16:16

[fix] fix DistLogProb comments

0ebeebc

[fix] restore tp size

1a7cc25

duanjunwen requested a review from TongLi3701 March 18, 2025 08:22

[fix] fix comments

7e2f058

TongLi3701 approved these changes Mar 18, 2025

View reviewed changes

Comment thread colossalai/shardformer/layer/loss.py Outdated

Comment thread colossalai/shardformer/layer/loss.py Outdated

[fix] fix comment; fix LogSoftmax usage

f381cea

duanjunwen requested a review from TongLi3701 March 18, 2025 09:32

TongLi3701 approved these changes Mar 18, 2025

View reviewed changes

TongLi3701 merged commit 7795d4c into hpcaitech:grpo-latest Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support Distributed LogProb for GRPO Training#6247

[Feature] Support Distributed LogProb for GRPO Training#6247
TongLi3701 merged 22 commits intohpcaitech:grpo-latestfrom
duanjunwen:grpo-dist-loss

duanjunwen commented Mar 14, 2025

Uh oh!

TongLi3701 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TongLi3701 commented Mar 17, 2025

Uh oh!

duanjunwen commented Mar 18, 2025 •

edited

Loading

Uh oh!

TongLi3701 left a comment •

edited

Loading

Uh oh!

Uh oh!

duanjunwen commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

duanjunwen commented Mar 14, 2025

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

TongLi3701 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TongLi3701 commented Mar 17, 2025

Uh oh!

duanjunwen commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TongLi3701 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

duanjunwen commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

duanjunwen commented Mar 18, 2025 •

edited

Loading

TongLi3701 left a comment •

edited

Loading