Skip to content

docs: Expand training customization examples#4427

Merged
behroozazarkhalili merged 9 commits intohuggingface:mainfrom
behroozazarkhalili:docs/expand-training-customization
Dec 1, 2025
Merged

docs: Expand training customization examples#4427
behroozazarkhalili merged 9 commits intohuggingface:mainfrom
behroozazarkhalili:docs/expand-training-customization

Conversation

@behroozazarkhalili
Copy link
Collaborator

Summary

This PR addresses issue #4379 by expanding the Training Customization documentation section with 5 new comprehensive examples, rather than removing it.

Resolves #4379

Changes Made

New Examples Added (5):

  1. Custom Callbacks - Shows how to add custom callbacks for logging, monitoring, or early stopping
  2. Custom Evaluation Metrics - Demonstrates defining custom metrics to track during training
  3. Mixed Precision Training - Explains bf16/fp16 usage for speed and memory optimization
  4. Gradient Accumulation - Shows how to simulate larger batch sizes with limited GPU memory
  5. Custom Data Collator - Demonstrates custom data preprocessing and padding strategies

Documentation Improvements:

  • Updated introduction for better clarity and consistency
  • All examples follow the same pattern as existing examples
  • All code examples verified against the codebase
  • Proper imports and configuration options validated

Statistics

  • Original examples: 5
  • New examples: 5
  • Total examples: 10 (doubled!)
  • Lines added: ~150

Verification

✅ All imports verified against codebase
✅ All config options verified in DPOConfig
✅ DataCollatorForPreference import path corrected
✅ Consistent code style with existing examples
✅ Examples apply to most/all trainers as stated

Test Plan

  • Verified all imports exist in the codebase
  • Validated config parameters against DPOConfig
  • Ensured consistent formatting with existing examples
  • Checked that examples follow DPOTrainer pattern as stated in intro

Resolves huggingface#4379

- Add custom callbacks example for logging and monitoring
- Add custom evaluation metrics example
- Add mixed precision training example (bf16/fp16)
- Add gradient accumulation example
- Add custom data collator example
- Update introduction for better clarity
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

behroozazarkhalili added a commit that referenced this pull request Nov 3, 2025
- Clarify that bf16 is the default in mixed precision section
- Move gradient accumulation section to reducing memory guide
- Expand gradient accumulation examples to include DPO, SFT, and Reward trainers

Addresses review comments from @qgallouedec on PR #4427
@behroozazarkhalili
Copy link
Collaborator Author

I've addressed both review comments:

  1. Mixed precision section: Added clarification that bf16=True is the default in TRL. Updated the example to show when/how to override defaults for older GPUs or to disable mixed precision.

  2. Gradient accumulation section: Moved from customization guide to the reducing memory usage guide (reducing_memory_usage.md), as it's primarily a memory optimization technique. Expanded the examples to include DPO, SFT, and Reward trainers.

Ready for re-review!

@behroozazarkhalili behroozazarkhalili enabled auto-merge (squash) November 4, 2025 15:51
Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!
I'd aim on reducing the code snippets content, only including the relevant part. Otherwise it's complicated to understand what's been added. A good example of this is the subsection Use the accelerator cache optimizer, where it directly points to the added param.
Btw the optimize_device_cache is no longer part of the codebase so that subsection can actually be removed :)

sergiopaniego and others added 4 commits November 28, 2025 16:04
Address review feedback:
- Remove obsolete optimize_device_cache section (no longer in codebase)
- Reduce code snippets to show only relevant customization parts
- Keep first example complete as reference, subsequent examples focused
- Remove ~120 lines of repetitive boilerplate

Improves clarity by highlighting the actual customization being demonstrated.
Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@behroozazarkhalili behroozazarkhalili merged commit f1dfef0 into huggingface:main Dec 1, 2025
1 check passed
@qgallouedec qgallouedec mentioned this pull request Dec 1, 2025
qgallouedec added a commit that referenced this pull request Dec 1, 2025
commit 07b4a84
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Dec 1 12:55:24 2025 -0700

    Silence experimental warnings when imported in the stable (#4606)

commit c55ef4b
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Dec 1 12:40:42 2025 -0700

    Update How-to guides (#4604)

commit c686d7d
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Dec 1 20:34:31 2025 +0100

    Raise FutureWarning for classes moved to experimental (#4605)

commit c7d172b
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Mon Dec 1 01:47:22 2025 -0800

    docs: Expand speeding up training guide with acceleration methods (#4428)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit f1dfef0
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Mon Dec 1 01:39:08 2025 -0800

    docs: Expand training customization examples (#4427)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit eb76389
Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Date:   Sun Nov 30 16:45:21 2025 +0100

    [GRPO] Sequence-level TIS & MIS (#4530)

commit 0726977
Author: xuanduy04 <65279552+xuanduy04@users.noreply.github.com>
Date:   Fri Nov 28 23:56:22 2025 +0700

    docs: Add Beyond the 80/20 Rule (2506.01939) to Paper Index (#4580)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 9731d08
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Nov 28 17:43:38 2025 +0100

    Revert "Hotfix CI with dev dependencies: xfail test_prepare_inputs_for_generation" (#4587)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 84a0bbc
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Nov 28 16:13:56 2025 +0100

    Fix 'generation_config' AttributeError (#4596)

commit f67c3f2
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Nov 28 15:46:02 2025 +0100

    Remove module-level imports of extra deps in experimental.judges (#4598)

commit cb5fdf9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Nov 27 11:08:26 2025 +0100

    Add missing require_bitsandbytes marker to CI tests (#4586)

commit 4a3b584
Author: juejuezi <juejuezi.git@foxmail.com>
Date:   Thu Nov 27 00:11:56 2025 +0800

    fix: use shift_labels for metrics when using CP or SP (#4579)

    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit d2e4315
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Nov 26 15:40:15 2025 +0100

    Revert hotfix Fall back to config.text_config._name_or_path (#4581)

commit 357e331
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 26 04:55:46 2025 -0700

    Move tests for GSPOTokenTrainer to experimental (#4572)

commit a59f2cf
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 26 04:50:44 2025 -0700

    Move `WinRateCallback` to experimental (#4558)

    Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit cf431db
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 26 04:11:04 2025 -0700

    Fix PPO example (#4556)

commit cac9f1d
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Tue Nov 25 21:27:58 2025 +0000

    Fix Replay Buffer docs. (#4574)

commit 547d924
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Nov 25 09:34:22 2025 -0700

    Add `shuffle_dataset` option to `SFTTrainer` (#4564)

commit b01f8ca
Author: iliasmerigh <91261122+iliasmerigh@users.noreply.github.com>
Date:   Tue Nov 25 17:33:14 2025 +0100

    Fix typo in GRPO description in README (#4573)

commit 7856d3b
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Nov 25 09:32:39 2025 -0700

    Fix vLLM sleep mode: add collective RPC call to reload weights in vLLM wake-up process (#4571)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
    Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

commit 64d089e
Author: lewtun <lewis.c.tunstall@gmail.com>
Date:   Tue Nov 25 14:39:40 2025 +0100

    Reasoning reward (#4563)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 3b7d0e4
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date:   Tue Nov 25 04:48:06 2025 +0000

    Remove Online DPO from stable trainers section in documentation

commit 6f3a452
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Nov 24 08:11:49 2025 -0700

    Reorder documentation TOC to surface key trainer sections (#4565)

commit 46af266
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Mon Nov 24 02:39:25 2025 -0800

    docs: Rewrite PEFT integration guide with comprehensive examples (#4421)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit db4f6e5
Author: mingxuetian <108911581+mingxuetian@users.noreply.github.com>
Date:   Mon Nov 24 09:51:42 2025 +0800

    Add `num_generations_eval` parameter for efficient evaluation (#4458)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 07f3c95
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Sun Nov 23 17:33:36 2025 -0800

    Move OnlineDPOTrainer to experimental module (#4473)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 4cb1a25
Author: Kashif Rasul <kashif.rasul@gmail.com>
Date:   Sat Nov 22 23:31:29 2025 +0100

    [SFT] Log mean token accuracy from Liger kernel (#4302)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 468b9d4
Author: Susant <acharysusant@gmail.com>
Date:   Sun Nov 23 03:40:32 2025 +0530

    docs: add KTO (2402.01306) to Paper Index + link ref to KTOTrainer (#4440)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 9bc6206
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Fri Nov 21 17:34:50 2025 -0800

    Move PRMTrainer to trl.experimental.prm (#4483)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit f7ac974
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Fri Nov 21 16:01:04 2025 +0100

    Update OpenEnv guide with new notebook (#4555)

commit c0de042
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Fri Nov 21 15:40:25 2025 +0100

    Add GRPO Wordle OpenEnv Colab (#4542)

commit 9f8ef40
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Thu Nov 20 22:36:31 2025 -0800

    [ORPO] Move ORPOTrainer to experimental (#4480)

commit 3bb5d76
Author: Jen Wei <45276133+JenWei0312@users.noreply.github.com>
Date:   Thu Nov 20 18:53:10 2025 -0700

    fix+docs: `device_map=None` for DeepSpeed and add ZeRO paper (1910.02054) to Paper Index (#4551)

commit 375b3eb
Author: Jonny Li <jonny_li@live.ca>
Date:   Thu Nov 20 19:42:45 2025 -0500

    Add target_parameters to LoraConfig (#4536)

commit 237900d
Author: Kristian Schwethelm <47533587+kschwethelm@users.noreply.github.com>
Date:   Thu Nov 20 23:03:20 2025 +0100

    Fix bug with VLM processors in prompt-completion completion text-only training (#4553)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 52ed4df
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date:   Thu Nov 20 21:41:23 2025 +0000

    Fix style OpenEnv example

commit a263946
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Nov 20 14:44:15 2025 +0100

    Update OpenEnv guide with latest details (#4552)

    Co-authored-by: burtenshaw <ben.burtenshaw@gmail.com>

commit 1a9ff52
Author: Kashif Rasul <kashif.rasul@gmail.com>
Date:   Wed Nov 19 15:34:25 2025 +0100

    [OpenEnv] browsergym example script (#4539)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit 6cbcd94
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Wed Nov 19 14:39:44 2025 +0100

    Update OpenEnv example scripts (#4547)

commit 8510589
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Wed Nov 19 14:39:20 2025 +0100

    Add OpenEnv Script examples to docs (#4533)

commit e622196
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Nov 17 03:12:30 2025 -0700

    [Doc] Drop dummy reward and dataset for DeepMath-103K and accuracy reward (#4524)

commit 1b1242c
Author: Kashif Rasul <kashif.rasul@gmail.com>
Date:   Fri Nov 14 20:51:41 2025 +0100

    [OpenEnv] add vllm colocate mode to openenv scripts (#4510)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit f39d18a
Author: Fabio Milentiansen Sim <sim.fabio.fms@gmail.com>
Date:   Fri Nov 14 23:39:02 2025 +0700

    fix(GOLDTrainer): Resolve incorrect attribute access and VLLMClient.generate() output type (#4526)

commit d45eaab
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Fri Nov 14 12:12:09 2025 +0100

    Add vLLM quantization option for colocate (#4496)

    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit a91d4b3
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Fri Nov 14 02:19:08 2025 +0100

    Prevent upcasting norm layers in `prepare_model_for_kbit_training` (#4457)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 121318e
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Thu Nov 13 17:13:16 2025 -0800

    docs: Extend CLI basic usage examples to all supported CLIs (#4425)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 7918320
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Thu Nov 13 13:20:52 2025 -0700

    Remove test trainer args (#4517)

commit 102dc41
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Thu Nov 13 12:36:43 2025 -0700

    Rename `flash-attn` to `flash-attn2` (#4514)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit 5de62b0
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Thu Nov 13 12:05:48 2025 -0700

    Add step time metric to GRPO Trainer for performance tracking (#4516)

    Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

commit f1e6377
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Thu Nov 13 11:01:19 2025 -0800

    Move PPOTrainer to trl.experimental.ppo (#4482)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 01f497e
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Thu Nov 13 10:14:58 2025 -0800

    Move NashMDTrainer to experimental module (#4477)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit b6c838a
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date:   Thu Nov 13 16:53:26 2025 +0000

    `aws-general-8-plus` runner for Docker build

commit ed5c7bb
Author: YangKai0616 <kai.yang@intel.com>
Date:   Fri Nov 14 00:42:48 2025 +0800

    [Bug Fix] OnlineDPOTrainer with vLLM Server Mode (#4500)

commit ded9bc6
Author: lewtun <lewis.c.tunstall@gmail.com>
Date:   Thu Nov 13 17:33:59 2025 +0100

    Fix Docker images for Liger (#4522)

commit fd04760
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Thu Nov 13 11:31:10 2025 +0000

    Paper Index: Change `num_completions` to `num_generations` (#4515)

commit b7918c0
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Wed Nov 12 20:35:44 2025 -0800

    Move GKDTrainer to experimental module (#4474)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 07b5011
Author: Tamoghno Kandar <55907205+tamoghnokandar@users.noreply.github.com>
Date:   Wed Nov 12 20:07:33 2025 -0800

    Replace flash attention2 with kernels-community/flash-attn2 (#4426)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 7a57fd4
Author: Yuxian Gu <guyx21@mails.tsinghua.edu.cn>
Date:   Thu Nov 13 11:16:20 2025 +0800

    MiniLLM: Fix arguments in config & add to documentation index (#4518)

commit a145eaf
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Wed Nov 12 16:35:46 2025 -0800

    refactor: Move CPOTrainer to experimental module (#4470)

commit d2dc717
Author: Taha Yassine <40228615+taha-yassine@users.noreply.github.com>
Date:   Thu Nov 13 00:56:47 2025 +0100

    Replace `wandb_log_unique_prompts` with `log_unique_prompts` (#4508)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 799b39b
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 12 16:21:05 2025 -0700

    `device_map` and `dtype` to `"auto"` by default (#4509)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit a6a2beb
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 12 09:42:31 2025 -0700

    Add temporary workaround for `lr_scheduler_kwargs` dtype issue in Transformers 4.57.0 (#4513)

commit 346701a
Author: lewtun <lewis.c.tunstall@gmail.com>
Date:   Wed Nov 12 17:42:18 2025 +0100

    Replace accelerate logging with stdlib in CLI (#4512)

commit 4db63af
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date:   Wed Nov 12 02:19:51 2025 +0000

    Fix GRPO unsqueeze advantages

commit ecb2811
Author: Yuxian Gu <guyx21@mails.tsinghua.edu.cn>
Date:   Wed Nov 12 10:17:22 2025 +0800

    Add MiniLLM Trainer (#4504)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 89e4688
Author: Taha Yassine <40228615+taha-yassine@users.noreply.github.com>
Date:   Tue Nov 11 20:36:23 2025 +0100

    Add support for images inside tables with Trackio completions logging (#4505)

commit 2d3279c
Author: lewtun <lewis.c.tunstall@gmail.com>
Date:   Tue Nov 11 19:22:25 2025 +0100

    Tweak description for vLLM sleep mode (#4506)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 02a3477
Author: Luke Hinds <lukehinds@gmail.com>
Date:   Mon Nov 10 16:41:51 2025 +0000

    Fix link to OpenEnv docs (#4502)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit aaed6c1
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Sat Nov 8 08:20:48 2025 -0700

    Consistency regarding relative imports (#4498)

commit 20760ba
Author: burtenshaw <ben.burtenshaw@gmail.com>
Date:   Fri Nov 7 10:50:50 2025 +0100

    [DOCS] update and fix openenv (#4490)

    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit 64cfca4
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Thu Nov 6 22:47:04 2025 -0800

    Move judges to experimental submodule (#4439)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 97ca1a2
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Fri Nov 7 00:20:15 2025 +0000

    Fix bugs in CISPO conditions (#4499)

commit ffb3dd5
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Thu Nov 6 16:03:00 2025 -0800

    docs: Add PEFT subsection to reducing memory usage guide (#4430)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit 43b6541
Author: SolarWindRider <31797478+SolarWindRider@users.noreply.github.com>
Date:   Fri Nov 7 06:55:34 2025 +0800

    Support completion bootstrap for VLM in GRPO/RLOO (#4452)

    Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 642b721
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Thu Nov 6 22:33:00 2025 +0000

    ScaleRL: Add CISPO Loss (#4495)

commit 32e9c9f
Author: Ishita Bhattacharyya <139248026+ishitab02@users.noreply.github.com>
Date:   Fri Nov 7 03:37:43 2025 +0530

    ⛴️ Add kernels to Docker images (#4445)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 1bcfc50
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Thu Nov 6 13:40:12 2025 -0800

    Move XPOTrainer to trl.experimental.xpo (#4485)

    Co-authored-by: Invidia19 <54266187+Invidia19@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 37942bc
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Thu Nov 6 21:32:03 2025 +0000

    Buffer samples based on group level stds. (#4492)

commit 66cd02a
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Nov 6 20:58:25 2025 +0100

    Add tiny model Qwen3VLForConditionalGeneration to CI (#4494)

commit 32febb4
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Nov 6 18:21:56 2025 +0100

    Add LFM2 to SFT notebook examples (#4455)
qgallouedec pushed a commit to neha222222/trl that referenced this pull request Dec 5, 2025
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
qgallouedec added a commit to neha222222/trl that referenced this pull request Dec 5, 2025
commit cbd90d4
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Dec 4 20:05:43 2025 +0100

    Remove deprecated batched formatting in GOLDTrainer (huggingface#4622)

commit 903b57d
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Dec 4 19:16:00 2025 +0100

    Update ministral notebooks with official bf16 ckpt (huggingface#4626)

commit 9266135
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Dec 4 19:01:46 2025 +0100

    Fix link to OpenEnv blog in docs (huggingface#4625)

commit 495381d
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Dec 4 11:32:34 2025 +0100

    Fix README style (huggingface#4619)

commit ddb65e8
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Dec 3 21:20:40 2025 +0100

    Add experimental imports to docs (huggingface#4616)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 5fab472
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Dec 3 17:38:16 2025 +0100

    Replace arXiv paper links with HF links (huggingface#4613)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit a3c1dfb
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Wed Dec 3 17:28:45 2025 +0100

    Add ministral 3 free notebooks (huggingface#4614)

commit 560fd3d
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Wed Dec 3 10:12:20 2025 +0000

    [GRPOTrainer]: Add SAPO Loss (huggingface#4600)

commit 814d4af
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Dec 2 15:52:51 2025 -0700

    Move MergeModelCallback to experimental (huggingface#4608)

    Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

commit 2a81076
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Tue Dec 2 15:07:11 2025 +0100

    Fixed OpenEnv example scripts (huggingface#4610)

commit de343cd
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Tue Dec 2 07:32:22 2025 +0100

    Remove deprecations for 0.26 release (huggingface#4607)

commit 07b4a84
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Dec 1 12:55:24 2025 -0700

    Silence experimental warnings when imported in the stable (huggingface#4606)

commit c55ef4b
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Dec 1 12:40:42 2025 -0700

    Update How-to guides (huggingface#4604)

commit c686d7d
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Dec 1 20:34:31 2025 +0100

    Raise FutureWarning for classes moved to experimental (huggingface#4605)

commit c7d172b
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Mon Dec 1 01:47:22 2025 -0800

    docs: Expand speeding up training guide with acceleration methods (huggingface#4428)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit f1dfef0
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Mon Dec 1 01:39:08 2025 -0800

    docs: Expand training customization examples (huggingface#4427)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit eb76389
Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Date:   Sun Nov 30 16:45:21 2025 +0100

    [GRPO] Sequence-level TIS & MIS (huggingface#4530)

commit 0726977
Author: xuanduy04 <65279552+xuanduy04@users.noreply.github.com>
Date:   Fri Nov 28 23:56:22 2025 +0700

    docs: Add Beyond the 80/20 Rule (2506.01939) to Paper Index (huggingface#4580)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 9731d08
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Nov 28 17:43:38 2025 +0100

    Revert "Hotfix CI with dev dependencies: xfail test_prepare_inputs_for_generation" (huggingface#4587)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 84a0bbc
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Nov 28 16:13:56 2025 +0100

    Fix 'generation_config' AttributeError (huggingface#4596)

commit f67c3f2
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Nov 28 15:46:02 2025 +0100

    Remove module-level imports of extra deps in experimental.judges (huggingface#4598)

commit cb5fdf9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Nov 27 11:08:26 2025 +0100

    Add missing require_bitsandbytes marker to CI tests (huggingface#4586)

commit 4a3b584
Author: juejuezi <juejuezi.git@foxmail.com>
Date:   Thu Nov 27 00:11:56 2025 +0800

    fix: use shift_labels for metrics when using CP or SP (huggingface#4579)

    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit d2e4315
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Nov 26 15:40:15 2025 +0100

    Revert hotfix Fall back to config.text_config._name_or_path (huggingface#4581)

commit 357e331
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 26 04:55:46 2025 -0700

    Move tests for GSPOTokenTrainer to experimental (huggingface#4572)

commit a59f2cf
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 26 04:50:44 2025 -0700

    Move `WinRateCallback` to experimental (huggingface#4558)

    Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit cf431db
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 26 04:11:04 2025 -0700

    Fix PPO example (huggingface#4556)

commit cac9f1d
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Tue Nov 25 21:27:58 2025 +0000

    Fix Replay Buffer docs. (huggingface#4574)
qgallouedec added a commit to neha222222/trl that referenced this pull request Dec 5, 2025
commit f278d03
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Dec 5 19:34:42 2025 +0100

    Remove no longer applicable warning once BCO was moved to experimental (huggingface#4628)

commit e7071bf
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Fri Dec 5 10:07:16 2025 -0700

    Add logos as assets (huggingface#4627)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit 794d87f
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Dec 5 08:45:20 2025 +0100

    Add missing experimental autodoc classes to docs (huggingface#4618)

commit bc7888d
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Dec 5 07:48:33 2025 +0100

    Raise FutureWarning for trainer moved to experimental (huggingface#4620)

commit fce5dfd
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Dec 5 07:34:04 2025 +0100

    Raise warnings at 2nd stack level (huggingface#4621)

commit c5da8ec
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Dec 5 07:33:04 2025 +0100

    Silence experimental warning during docs build (huggingface#4623)

commit 2af35fb
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Thu Dec 4 21:23:41 2025 -0700

    Clean up model preparation  (huggingface#4577)

commit cbd90d4
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Dec 4 20:05:43 2025 +0100

    Remove deprecated batched formatting in GOLDTrainer (huggingface#4622)

commit 903b57d
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Dec 4 19:16:00 2025 +0100

    Update ministral notebooks with official bf16 ckpt (huggingface#4626)

commit 9266135
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Dec 4 19:01:46 2025 +0100

    Fix link to OpenEnv blog in docs (huggingface#4625)

commit 495381d
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Dec 4 11:32:34 2025 +0100

    Fix README style (huggingface#4619)

commit ddb65e8
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Dec 3 21:20:40 2025 +0100

    Add experimental imports to docs (huggingface#4616)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 5fab472
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Dec 3 17:38:16 2025 +0100

    Replace arXiv paper links with HF links (huggingface#4613)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit a3c1dfb
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Wed Dec 3 17:28:45 2025 +0100

    Add ministral 3 free notebooks (huggingface#4614)

commit 560fd3d
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Wed Dec 3 10:12:20 2025 +0000

    [GRPOTrainer]: Add SAPO Loss (huggingface#4600)

commit 814d4af
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Dec 2 15:52:51 2025 -0700

    Move MergeModelCallback to experimental (huggingface#4608)

    Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

commit 2a81076
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Tue Dec 2 15:07:11 2025 +0100

    Fixed OpenEnv example scripts (huggingface#4610)

commit de343cd
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Tue Dec 2 07:32:22 2025 +0100

    Remove deprecations for 0.26 release (huggingface#4607)

commit 07b4a84
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Dec 1 12:55:24 2025 -0700

    Silence experimental warnings when imported in the stable (huggingface#4606)

commit c55ef4b
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Dec 1 12:40:42 2025 -0700

    Update How-to guides (huggingface#4604)

commit c686d7d
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Dec 1 20:34:31 2025 +0100

    Raise FutureWarning for classes moved to experimental (huggingface#4605)

commit c7d172b
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Mon Dec 1 01:47:22 2025 -0800

    docs: Expand speeding up training guide with acceleration methods (huggingface#4428)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit f1dfef0
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Mon Dec 1 01:39:08 2025 -0800

    docs: Expand training customization examples (huggingface#4427)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit eb76389
Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Date:   Sun Nov 30 16:45:21 2025 +0100

    [GRPO] Sequence-level TIS & MIS (huggingface#4530)

commit 0726977
Author: xuanduy04 <65279552+xuanduy04@users.noreply.github.com>
Date:   Fri Nov 28 23:56:22 2025 +0700

    docs: Add Beyond the 80/20 Rule (2506.01939) to Paper Index (huggingface#4580)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 9731d08
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Nov 28 17:43:38 2025 +0100

    Revert "Hotfix CI with dev dependencies: xfail test_prepare_inputs_for_generation" (huggingface#4587)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 84a0bbc
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Nov 28 16:13:56 2025 +0100

    Fix 'generation_config' AttributeError (huggingface#4596)

commit f67c3f2
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Nov 28 15:46:02 2025 +0100

    Remove module-level imports of extra deps in experimental.judges (huggingface#4598)

commit cb5fdf9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Nov 27 11:08:26 2025 +0100

    Add missing require_bitsandbytes marker to CI tests (huggingface#4586)

commit 4a3b584
Author: juejuezi <juejuezi.git@foxmail.com>
Date:   Thu Nov 27 00:11:56 2025 +0800

    fix: use shift_labels for metrics when using CP or SP (huggingface#4579)

    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit d2e4315
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Nov 26 15:40:15 2025 +0100

    Revert hotfix Fall back to config.text_config._name_or_path (huggingface#4581)

commit 357e331
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 26 04:55:46 2025 -0700

    Move tests for GSPOTokenTrainer to experimental (huggingface#4572)

commit a59f2cf
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 26 04:50:44 2025 -0700

    Move `WinRateCallback` to experimental (huggingface#4558)

    Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit cf431db
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Nov 26 04:11:04 2025 -0700

    Fix PPO example (huggingface#4556)

commit cac9f1d
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Tue Nov 25 21:27:58 2025 +0000

    Fix Replay Buffer docs. (huggingface#4574)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove or populate "Training customization"

4 participants

Comments