Align GRPO and RLOO initialization by qgallouedec · Pull Request #4685 · huggingface/trl

qgallouedec · 2025-12-12T21:26:49Z

GRPO recently benefited from some improvements in initialization that were not applied to RLOO. This PR aligns the two initializations.

HuggingFaceDocBuilderDev · 2025-12-12T21:30:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-12-12T21:28:35Z

trl/trainer/rloo_trainer.py

    from datasets import load_dataset
    from trl import RLOOTrainer
+    from trl.rewards import accuracy_reward

-    dataset = load_dataset("trl-lib/tldr", split="train")
-
-
-    def reward_func(completions, **kwargs):
-        # Dummy reward function that rewards completions with more unique letters.
-        return [float(len(set(completion))) for completion in completions]
-
+    dataset = load_dataset("trl-lib/DeepMath-103K", split="train")

    trainer = RLOOTrainer(
        model="Qwen/Qwen2-0.5B-Instruct",
-        reward_funcs=reward_func,
+        reward_funcs=accuracy_reward,
        train_dataset=dataset,
    )
-
    trainer.train()


qgallouedec · 2025-12-12T21:30:31Z

trl/trainer/rloo_trainer.py

            model_name = model_name.split("/")[-1]
            args = RLOOConfig(f"{model_name}-RLOO")

-        # Models


all other changes come from #4577

…g`, `prepare_peft_model` to `experimental.utils` (#4686)

qgallouedec added 2 commits December 12, 2025 21:24

align rloo and grpo

4d7345a

style

21549b2

qgallouedec commented Dec 12, 2025

View reviewed changes

qgallouedec requested review from albertvillanova, edbeeching, kashif and lewtun December 12, 2025 21:31

Merge branch 'main' into align-rloo

c336c9a

kashif approved these changes Dec 16, 2025

View reviewed changes

qgallouedec added 3 commits December 16, 2025 08:43

Move prepare_model_for_kbit_training, `enable_gradient_checkpointin…

8be5fc0

…g`, `prepare_peft_model` to `experimental.utils` (#4686)

Merge branch 'main' into align-rloo

c0f649a

Merge branch 'main' into align-rloo

5d42fcd

qgallouedec merged commit 00da046 into main Dec 16, 2025
10 of 11 checks passed

qgallouedec deleted the align-rloo branch December 16, 2025 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align GRPO and RLOO initialization#4685

Align GRPO and RLOO initialization#4685
qgallouedec merged 6 commits intomainfrom
align-rloo

qgallouedec commented Dec 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 12, 2025

Uh oh!

qgallouedec Dec 12, 2025

Uh oh!

qgallouedec Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

qgallouedec commented Dec 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 12, 2025

Uh oh!

qgallouedec Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments