Skip to content

[Feature] Support Distributed LogProb for GRPO Training#6247

Merged
TongLi3701 merged 22 commits intohpcaitech:grpo-latestfrom
duanjunwen:grpo-dist-loss
Mar 18, 2025
Merged

[Feature] Support Distributed LogProb for GRPO Training#6247
TongLi3701 merged 22 commits intohpcaitech:grpo-latestfrom
duanjunwen:grpo-dist-loss

Conversation

@duanjunwen
Copy link
Copy Markdown
Contributor

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs
  • I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@duanjunwen duanjunwen requested a review from a team as a code owner March 14, 2025 10:42
@duanjunwen duanjunwen requested a review from TongLi3701 March 17, 2025 01:25
Copy link
Copy Markdown
Contributor

@TongLi3701 TongLi3701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Junwen, I left some comments.

Comment thread applications/ColossalChat/coati/distributed/consumer.py Outdated
Comment thread applications/ColossalChat/coati/distributed/grpo_consumer.py Outdated
Comment thread applications/ColossalChat/coati/distributed/grpo_consumer.py Outdated
Comment thread applications/ColossalChat/coati/distributed/grpo_consumer.py Outdated
Comment thread applications/ColossalChat/coati/distributed/utils.py
Comment thread colossalai/shardformer/layer/loss.py Outdated
Comment thread colossalai/shardformer/layer/loss.py Outdated
Comment thread colossalai/shardformer/layer/loss.py Outdated
Comment thread colossalai/shardformer/policies/qwen2.py Outdated
Comment thread tests/test_shardformer/test_layer/test_dist_log_prob.py Outdated
@TongLi3701
Copy link
Copy Markdown
Contributor

Please also compare the peak memory when using these two different methods.

@duanjunwen duanjunwen requested a review from TongLi3701 March 17, 2025 10:39
@duanjunwen
Copy link
Copy Markdown
Contributor Author

duanjunwen commented Mar 18, 2025

Please also compare the peak memory when using these two different methods.

We compared the case where the parallel output is True or False under strategy tp2zero1.
When parallel output is False, the peak mem is 93000+ MB (may suffer OOM when other users grab resources).
When parallel output is True, the peak mem is around 82000 MB.

Copy link
Copy Markdown
Contributor

@TongLi3701 TongLi3701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Junwen, I left some comments. Please address the comments and click on resolve conversation then request review again.

Thanks.

Comment thread applications/ColossalChat/coati/distributed/utils.py Outdated
@duanjunwen duanjunwen requested a review from TongLi3701 March 18, 2025 03:55
@duanjunwen
Copy link
Copy Markdown
Contributor Author

Resolve Conflict.

Comment thread applications/ColossalChat/coati/distributed/consumer.py Outdated
Comment thread colossalai/shardformer/layer/loss.py Outdated
@duanjunwen duanjunwen requested a review from TongLi3701 March 18, 2025 08:22
Comment thread colossalai/shardformer/layer/loss.py Outdated
Comment thread colossalai/shardformer/layer/loss.py Outdated
@duanjunwen duanjunwen requested a review from TongLi3701 March 18, 2025 09:32
@TongLi3701 TongLi3701 merged commit 7795d4c into hpcaitech:grpo-latest Mar 18, 2025
TongLi3701 added a commit that referenced this pull request Apr 21, 2025
* add reward related function

* add simple grpo

* update grpo

* polish

* modify data loader

* grpo consumer

* update loss

* update reward fn

* update example

* update loader

* add algo selection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add save

* update select algo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update grpo

* update reward fn

* update reward

* fix reward score

* add response length

* detach

* fix tp bug

* fix consumer

* convert to 8 generation

* print results

* setup update

* fix transformers backend

* [Feature] Support Distributed LogProb for GRPO Training (#6247)

* [fix] fix qwen VocabParallelLMHead1D and gather output

* fix tp bug

* fix consumer

* [feat] Support Distributed LogProb for GRPO Training

* [fix] fix loss func

* [fix] fix log prob plugin

* [fix] fix qwen modeling param

* [fix] rm comments

* [fix] rm hard-code;fix non-dist version

* [fix] fix test file param name and benchmark tp gather output=True/False

* [fix] rm non-dist version in dist log prob

* [fix] fix comments

* [fix] fix dis log prob plugin

* [fix] fix test case

* [fix] fix qwen VocabParallelLMHead1D and gather output

* [fix] fix DistLogProb comments

* [fix] restore tp size

* [fix] fix comments

* [fix] fix comment; fix LogSoftmax usage

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>

* fix vllm

* fix logprob, add filtering, temperature annealing, lr descent

* simplify vllm preprocessing input ids

* update logging

* [feat] add microbatch forwarding (#6251)

* add microbatch forwarding

* fix forward microbatch

* fix producer OOM

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change project name

* fix temperature annealing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address conversation

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Distributed RLHF] Integration of PP (#6257)

* update help information

* update style

* fix

* minor fix

* support PP training

* add pp support

* remove unused code

* address conversation

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>

* [hot-fix] Fix memory leakage bug, support TP+PP (#6258)

* update help information

* update style

* fix

* minor fix

* support PP training

* add pp support

* remove unused code

* address conversation

* fix memory leakage support tp+pp

* move empty cache

* move empty cache

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: YeAnbang <anbangy2@outlook.com>
Co-authored-by: duanjunwen <935724073@qq.com>
Co-authored-by: YeAnbang <44796419+YeAnbang@users.noreply.github.com>
YeAnbang pushed a commit that referenced this pull request Aug 5, 2025
* [fix] fix qwen VocabParallelLMHead1D and gather output

* fix tp bug

* fix consumer

* [feat] Support Distributed LogProb for GRPO Training

* [fix] fix loss func

* [fix] fix log prob plugin

* [fix] fix qwen modeling param

* [fix] rm comments

* [fix] rm hard-code;fix non-dist version

* [fix] fix test file param name and benchmark tp gather output=True/False

* [fix] rm non-dist version in dist log prob

* [fix] fix comments

* [fix] fix dis log prob plugin

* [fix] fix test case

* [fix] fix qwen VocabParallelLMHead1D and gather output

* [fix] fix DistLogProb comments

* [fix] restore tp size

* [fix] fix comments

* [fix] fix comment; fix LogSoftmax usage

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>
YeAnbang pushed a commit that referenced this pull request Aug 5, 2025
* [fix] fix qwen VocabParallelLMHead1D and gather output

* fix tp bug

* fix consumer

* [feat] Support Distributed LogProb for GRPO Training

* [fix] fix loss func

* [fix] fix log prob plugin

* [fix] fix qwen modeling param

* [fix] rm comments

* [fix] rm hard-code;fix non-dist version

* [fix] fix test file param name and benchmark tp gather output=True/False

* [fix] rm non-dist version in dist log prob

* [fix] fix comments

* [fix] fix dis log prob plugin

* [fix] fix test case

* [fix] fix qwen VocabParallelLMHead1D and gather output

* [fix] fix DistLogProb comments

* [fix] restore tp size

* [fix] fix comments

* [fix] fix comment; fix LogSoftmax usage

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>
YeAnbang pushed a commit that referenced this pull request Aug 5, 2025
* [fix] fix qwen VocabParallelLMHead1D and gather output

* fix tp bug

* fix consumer

* [feat] Support Distributed LogProb for GRPO Training

* [fix] fix loss func

* [fix] fix log prob plugin

* [fix] fix qwen modeling param

* [fix] rm comments

* [fix] rm hard-code;fix non-dist version

* [fix] fix test file param name and benchmark tp gather output=True/False

* [fix] rm non-dist version in dist log prob

* [fix] fix comments

* [fix] fix dis log prob plugin

* [fix] fix test case

* [fix] fix qwen VocabParallelLMHead1D and gather output

* [fix] fix DistLogProb comments

* [fix] restore tp size

* [fix] fix comments

* [fix] fix comment; fix LogSoftmax usage

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants