docs: Expand training customization examples#4427
Conversation
Resolves huggingface#4379 - Add custom callbacks example for logging and monitoring - Add custom evaluation metrics example - Add mixed precision training example (bf16/fp16) - Add gradient accumulation example - Add custom data collator example - Update introduction for better clarity
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
- Clarify that bf16 is the default in mixed precision section - Move gradient accumulation section to reducing memory guide - Expand gradient accumulation examples to include DPO, SFT, and Reward trainers Addresses review comments from @qgallouedec on PR #4427
|
I've addressed both review comments:
Ready for re-review! |
sergiopaniego
left a comment
There was a problem hiding this comment.
Thanks!!
I'd aim on reducing the code snippets content, only including the relevant part. Otherwise it's complicated to understand what's been added. A good example of this is the subsection Use the accelerator cache optimizer, where it directly points to the added param.
Btw the optimize_device_cache is no longer part of the codebase so that subsection can actually be removed :)
Address review feedback: - Remove obsolete optimize_device_cache section (no longer in codebase) - Reduce code snippets to show only relevant customization parts - Keep first example complete as reference, subsequent examples focused - Remove ~120 lines of repetitive boilerplate Improves clarity by highlighting the actual customization being demonstrated.
commit 07b4a84 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Dec 1 12:55:24 2025 -0700 Silence experimental warnings when imported in the stable (#4606) commit c55ef4b Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Dec 1 12:40:42 2025 -0700 Update How-to guides (#4604) commit c686d7d Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Dec 1 20:34:31 2025 +0100 Raise FutureWarning for classes moved to experimental (#4605) commit c7d172b Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Mon Dec 1 01:47:22 2025 -0800 docs: Expand speeding up training guide with acceleration methods (#4428) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit f1dfef0 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Mon Dec 1 01:39:08 2025 -0800 docs: Expand training customization examples (#4427) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit eb76389 Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sun Nov 30 16:45:21 2025 +0100 [GRPO] Sequence-level TIS & MIS (#4530) commit 0726977 Author: xuanduy04 <65279552+xuanduy04@users.noreply.github.com> Date: Fri Nov 28 23:56:22 2025 +0700 docs: Add Beyond the 80/20 Rule (2506.01939) to Paper Index (#4580) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 9731d08 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Nov 28 17:43:38 2025 +0100 Revert "Hotfix CI with dev dependencies: xfail test_prepare_inputs_for_generation" (#4587) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 84a0bbc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Nov 28 16:13:56 2025 +0100 Fix 'generation_config' AttributeError (#4596) commit f67c3f2 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Nov 28 15:46:02 2025 +0100 Remove module-level imports of extra deps in experimental.judges (#4598) commit cb5fdf9 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Thu Nov 27 11:08:26 2025 +0100 Add missing require_bitsandbytes marker to CI tests (#4586) commit 4a3b584 Author: juejuezi <juejuezi.git@foxmail.com> Date: Thu Nov 27 00:11:56 2025 +0800 fix: use shift_labels for metrics when using CP or SP (#4579) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> commit d2e4315 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Nov 26 15:40:15 2025 +0100 Revert hotfix Fall back to config.text_config._name_or_path (#4581) commit 357e331 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 26 04:55:46 2025 -0700 Move tests for GSPOTokenTrainer to experimental (#4572) commit a59f2cf Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 26 04:50:44 2025 -0700 Move `WinRateCallback` to experimental (#4558) Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> commit cf431db Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 26 04:11:04 2025 -0700 Fix PPO example (#4556) commit cac9f1d Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Tue Nov 25 21:27:58 2025 +0000 Fix Replay Buffer docs. (#4574) commit 547d924 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Tue Nov 25 09:34:22 2025 -0700 Add `shuffle_dataset` option to `SFTTrainer` (#4564) commit b01f8ca Author: iliasmerigh <91261122+iliasmerigh@users.noreply.github.com> Date: Tue Nov 25 17:33:14 2025 +0100 Fix typo in GRPO description in README (#4573) commit 7856d3b Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Tue Nov 25 09:32:39 2025 -0700 Fix vLLM sleep mode: add collective RPC call to reload weights in vLLM wake-up process (#4571) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit 64d089e Author: lewtun <lewis.c.tunstall@gmail.com> Date: Tue Nov 25 14:39:40 2025 +0100 Reasoning reward (#4563) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 3b7d0e4 Author: Quentin Gallouédec <gallouedec.quentin@gmail.com> Date: Tue Nov 25 04:48:06 2025 +0000 Remove Online DPO from stable trainers section in documentation commit 6f3a452 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Nov 24 08:11:49 2025 -0700 Reorder documentation TOC to surface key trainer sections (#4565) commit 46af266 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Mon Nov 24 02:39:25 2025 -0800 docs: Rewrite PEFT integration guide with comprehensive examples (#4421) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit db4f6e5 Author: mingxuetian <108911581+mingxuetian@users.noreply.github.com> Date: Mon Nov 24 09:51:42 2025 +0800 Add `num_generations_eval` parameter for efficient evaluation (#4458) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 07f3c95 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Sun Nov 23 17:33:36 2025 -0800 Move OnlineDPOTrainer to experimental module (#4473) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 4cb1a25 Author: Kashif Rasul <kashif.rasul@gmail.com> Date: Sat Nov 22 23:31:29 2025 +0100 [SFT] Log mean token accuracy from Liger kernel (#4302) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 468b9d4 Author: Susant <acharysusant@gmail.com> Date: Sun Nov 23 03:40:32 2025 +0530 docs: add KTO (2402.01306) to Paper Index + link ref to KTOTrainer (#4440) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 9bc6206 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Fri Nov 21 17:34:50 2025 -0800 Move PRMTrainer to trl.experimental.prm (#4483) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit f7ac974 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Fri Nov 21 16:01:04 2025 +0100 Update OpenEnv guide with new notebook (#4555) commit c0de042 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Fri Nov 21 15:40:25 2025 +0100 Add GRPO Wordle OpenEnv Colab (#4542) commit 9f8ef40 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Thu Nov 20 22:36:31 2025 -0800 [ORPO] Move ORPOTrainer to experimental (#4480) commit 3bb5d76 Author: Jen Wei <45276133+JenWei0312@users.noreply.github.com> Date: Thu Nov 20 18:53:10 2025 -0700 fix+docs: `device_map=None` for DeepSpeed and add ZeRO paper (1910.02054) to Paper Index (#4551) commit 375b3eb Author: Jonny Li <jonny_li@live.ca> Date: Thu Nov 20 19:42:45 2025 -0500 Add target_parameters to LoraConfig (#4536) commit 237900d Author: Kristian Schwethelm <47533587+kschwethelm@users.noreply.github.com> Date: Thu Nov 20 23:03:20 2025 +0100 Fix bug with VLM processors in prompt-completion completion text-only training (#4553) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 52ed4df Author: Quentin Gallouédec <gallouedec.quentin@gmail.com> Date: Thu Nov 20 21:41:23 2025 +0000 Fix style OpenEnv example commit a263946 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Thu Nov 20 14:44:15 2025 +0100 Update OpenEnv guide with latest details (#4552) Co-authored-by: burtenshaw <ben.burtenshaw@gmail.com> commit 1a9ff52 Author: Kashif Rasul <kashif.rasul@gmail.com> Date: Wed Nov 19 15:34:25 2025 +0100 [OpenEnv] browsergym example script (#4539) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit 6cbcd94 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Wed Nov 19 14:39:44 2025 +0100 Update OpenEnv example scripts (#4547) commit 8510589 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Wed Nov 19 14:39:20 2025 +0100 Add OpenEnv Script examples to docs (#4533) commit e622196 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Nov 17 03:12:30 2025 -0700 [Doc] Drop dummy reward and dataset for DeepMath-103K and accuracy reward (#4524) commit 1b1242c Author: Kashif Rasul <kashif.rasul@gmail.com> Date: Fri Nov 14 20:51:41 2025 +0100 [OpenEnv] add vllm colocate mode to openenv scripts (#4510) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit f39d18a Author: Fabio Milentiansen Sim <sim.fabio.fms@gmail.com> Date: Fri Nov 14 23:39:02 2025 +0700 fix(GOLDTrainer): Resolve incorrect attribute access and VLLMClient.generate() output type (#4526) commit d45eaab Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Fri Nov 14 12:12:09 2025 +0100 Add vLLM quantization option for colocate (#4496) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> commit a91d4b3 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Fri Nov 14 02:19:08 2025 +0100 Prevent upcasting norm layers in `prepare_model_for_kbit_training` (#4457) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 121318e Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Thu Nov 13 17:13:16 2025 -0800 docs: Extend CLI basic usage examples to all supported CLIs (#4425) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 7918320 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Thu Nov 13 13:20:52 2025 -0700 Remove test trainer args (#4517) commit 102dc41 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Thu Nov 13 12:36:43 2025 -0700 Rename `flash-attn` to `flash-attn2` (#4514) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit 5de62b0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Thu Nov 13 12:05:48 2025 -0700 Add step time metric to GRPO Trainer for performance tracking (#4516) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit f1e6377 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Thu Nov 13 11:01:19 2025 -0800 Move PPOTrainer to trl.experimental.ppo (#4482) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 01f497e Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Thu Nov 13 10:14:58 2025 -0800 Move NashMDTrainer to experimental module (#4477) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit b6c838a Author: Quentin Gallouédec <gallouedec.quentin@gmail.com> Date: Thu Nov 13 16:53:26 2025 +0000 `aws-general-8-plus` runner for Docker build commit ed5c7bb Author: YangKai0616 <kai.yang@intel.com> Date: Fri Nov 14 00:42:48 2025 +0800 [Bug Fix] OnlineDPOTrainer with vLLM Server Mode (#4500) commit ded9bc6 Author: lewtun <lewis.c.tunstall@gmail.com> Date: Thu Nov 13 17:33:59 2025 +0100 Fix Docker images for Liger (#4522) commit fd04760 Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Thu Nov 13 11:31:10 2025 +0000 Paper Index: Change `num_completions` to `num_generations` (#4515) commit b7918c0 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Wed Nov 12 20:35:44 2025 -0800 Move GKDTrainer to experimental module (#4474) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 07b5011 Author: Tamoghno Kandar <55907205+tamoghnokandar@users.noreply.github.com> Date: Wed Nov 12 20:07:33 2025 -0800 Replace flash attention2 with kernels-community/flash-attn2 (#4426) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 7a57fd4 Author: Yuxian Gu <guyx21@mails.tsinghua.edu.cn> Date: Thu Nov 13 11:16:20 2025 +0800 MiniLLM: Fix arguments in config & add to documentation index (#4518) commit a145eaf Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Wed Nov 12 16:35:46 2025 -0800 refactor: Move CPOTrainer to experimental module (#4470) commit d2dc717 Author: Taha Yassine <40228615+taha-yassine@users.noreply.github.com> Date: Thu Nov 13 00:56:47 2025 +0100 Replace `wandb_log_unique_prompts` with `log_unique_prompts` (#4508) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 799b39b Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 12 16:21:05 2025 -0700 `device_map` and `dtype` to `"auto"` by default (#4509) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit a6a2beb Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 12 09:42:31 2025 -0700 Add temporary workaround for `lr_scheduler_kwargs` dtype issue in Transformers 4.57.0 (#4513) commit 346701a Author: lewtun <lewis.c.tunstall@gmail.com> Date: Wed Nov 12 17:42:18 2025 +0100 Replace accelerate logging with stdlib in CLI (#4512) commit 4db63af Author: Quentin Gallouédec <gallouedec.quentin@gmail.com> Date: Wed Nov 12 02:19:51 2025 +0000 Fix GRPO unsqueeze advantages commit ecb2811 Author: Yuxian Gu <guyx21@mails.tsinghua.edu.cn> Date: Wed Nov 12 10:17:22 2025 +0800 Add MiniLLM Trainer (#4504) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 89e4688 Author: Taha Yassine <40228615+taha-yassine@users.noreply.github.com> Date: Tue Nov 11 20:36:23 2025 +0100 Add support for images inside tables with Trackio completions logging (#4505) commit 2d3279c Author: lewtun <lewis.c.tunstall@gmail.com> Date: Tue Nov 11 19:22:25 2025 +0100 Tweak description for vLLM sleep mode (#4506) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 02a3477 Author: Luke Hinds <lukehinds@gmail.com> Date: Mon Nov 10 16:41:51 2025 +0000 Fix link to OpenEnv docs (#4502) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit aaed6c1 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Sat Nov 8 08:20:48 2025 -0700 Consistency regarding relative imports (#4498) commit 20760ba Author: burtenshaw <ben.burtenshaw@gmail.com> Date: Fri Nov 7 10:50:50 2025 +0100 [DOCS] update and fix openenv (#4490) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit 64cfca4 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Thu Nov 6 22:47:04 2025 -0800 Move judges to experimental submodule (#4439) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 97ca1a2 Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Fri Nov 7 00:20:15 2025 +0000 Fix bugs in CISPO conditions (#4499) commit ffb3dd5 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Thu Nov 6 16:03:00 2025 -0800 docs: Add PEFT subsection to reducing memory usage guide (#4430) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit 43b6541 Author: SolarWindRider <31797478+SolarWindRider@users.noreply.github.com> Date: Fri Nov 7 06:55:34 2025 +0800 Support completion bootstrap for VLM in GRPO/RLOO (#4452) Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 642b721 Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Thu Nov 6 22:33:00 2025 +0000 ScaleRL: Add CISPO Loss (#4495) commit 32e9c9f Author: Ishita Bhattacharyya <139248026+ishitab02@users.noreply.github.com> Date: Fri Nov 7 03:37:43 2025 +0530 ⛴️ Add kernels to Docker images (#4445) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 1bcfc50 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Thu Nov 6 13:40:12 2025 -0800 Move XPOTrainer to trl.experimental.xpo (#4485) Co-authored-by: Invidia19 <54266187+Invidia19@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit 37942bc Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Thu Nov 6 21:32:03 2025 +0000 Buffer samples based on group level stds. (#4492) commit 66cd02a Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Thu Nov 6 20:58:25 2025 +0100 Add tiny model Qwen3VLForConditionalGeneration to CI (#4494) commit 32febb4 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Thu Nov 6 18:21:56 2025 +0100 Add LFM2 to SFT notebook examples (#4455)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit cbd90d4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Thu Dec 4 20:05:43 2025 +0100 Remove deprecated batched formatting in GOLDTrainer (huggingface#4622) commit 903b57d Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Thu Dec 4 19:16:00 2025 +0100 Update ministral notebooks with official bf16 ckpt (huggingface#4626) commit 9266135 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Thu Dec 4 19:01:46 2025 +0100 Fix link to OpenEnv blog in docs (huggingface#4625) commit 495381d Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Thu Dec 4 11:32:34 2025 +0100 Fix README style (huggingface#4619) commit ddb65e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Dec 3 21:20:40 2025 +0100 Add experimental imports to docs (huggingface#4616) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 5fab472 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Dec 3 17:38:16 2025 +0100 Replace arXiv paper links with HF links (huggingface#4613) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit a3c1dfb Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Wed Dec 3 17:28:45 2025 +0100 Add ministral 3 free notebooks (huggingface#4614) commit 560fd3d Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Wed Dec 3 10:12:20 2025 +0000 [GRPOTrainer]: Add SAPO Loss (huggingface#4600) commit 814d4af Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Tue Dec 2 15:52:51 2025 -0700 Move MergeModelCallback to experimental (huggingface#4608) Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> commit 2a81076 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Tue Dec 2 15:07:11 2025 +0100 Fixed OpenEnv example scripts (huggingface#4610) commit de343cd Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Dec 2 07:32:22 2025 +0100 Remove deprecations for 0.26 release (huggingface#4607) commit 07b4a84 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Dec 1 12:55:24 2025 -0700 Silence experimental warnings when imported in the stable (huggingface#4606) commit c55ef4b Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Dec 1 12:40:42 2025 -0700 Update How-to guides (huggingface#4604) commit c686d7d Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Dec 1 20:34:31 2025 +0100 Raise FutureWarning for classes moved to experimental (huggingface#4605) commit c7d172b Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Mon Dec 1 01:47:22 2025 -0800 docs: Expand speeding up training guide with acceleration methods (huggingface#4428) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit f1dfef0 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Mon Dec 1 01:39:08 2025 -0800 docs: Expand training customization examples (huggingface#4427) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit eb76389 Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sun Nov 30 16:45:21 2025 +0100 [GRPO] Sequence-level TIS & MIS (huggingface#4530) commit 0726977 Author: xuanduy04 <65279552+xuanduy04@users.noreply.github.com> Date: Fri Nov 28 23:56:22 2025 +0700 docs: Add Beyond the 80/20 Rule (2506.01939) to Paper Index (huggingface#4580) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 9731d08 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Nov 28 17:43:38 2025 +0100 Revert "Hotfix CI with dev dependencies: xfail test_prepare_inputs_for_generation" (huggingface#4587) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 84a0bbc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Nov 28 16:13:56 2025 +0100 Fix 'generation_config' AttributeError (huggingface#4596) commit f67c3f2 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Nov 28 15:46:02 2025 +0100 Remove module-level imports of extra deps in experimental.judges (huggingface#4598) commit cb5fdf9 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Thu Nov 27 11:08:26 2025 +0100 Add missing require_bitsandbytes marker to CI tests (huggingface#4586) commit 4a3b584 Author: juejuezi <juejuezi.git@foxmail.com> Date: Thu Nov 27 00:11:56 2025 +0800 fix: use shift_labels for metrics when using CP or SP (huggingface#4579) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> commit d2e4315 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Nov 26 15:40:15 2025 +0100 Revert hotfix Fall back to config.text_config._name_or_path (huggingface#4581) commit 357e331 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 26 04:55:46 2025 -0700 Move tests for GSPOTokenTrainer to experimental (huggingface#4572) commit a59f2cf Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 26 04:50:44 2025 -0700 Move `WinRateCallback` to experimental (huggingface#4558) Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> commit cf431db Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 26 04:11:04 2025 -0700 Fix PPO example (huggingface#4556) commit cac9f1d Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Tue Nov 25 21:27:58 2025 +0000 Fix Replay Buffer docs. (huggingface#4574)
commit f278d03 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Dec 5 19:34:42 2025 +0100 Remove no longer applicable warning once BCO was moved to experimental (huggingface#4628) commit e7071bf Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Dec 5 10:07:16 2025 -0700 Add logos as assets (huggingface#4627) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit 794d87f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Dec 5 08:45:20 2025 +0100 Add missing experimental autodoc classes to docs (huggingface#4618) commit bc7888d Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Dec 5 07:48:33 2025 +0100 Raise FutureWarning for trainer moved to experimental (huggingface#4620) commit fce5dfd Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Dec 5 07:34:04 2025 +0100 Raise warnings at 2nd stack level (huggingface#4621) commit c5da8ec Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Dec 5 07:33:04 2025 +0100 Silence experimental warning during docs build (huggingface#4623) commit 2af35fb Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Thu Dec 4 21:23:41 2025 -0700 Clean up model preparation (huggingface#4577) commit cbd90d4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Thu Dec 4 20:05:43 2025 +0100 Remove deprecated batched formatting in GOLDTrainer (huggingface#4622) commit 903b57d Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Thu Dec 4 19:16:00 2025 +0100 Update ministral notebooks with official bf16 ckpt (huggingface#4626) commit 9266135 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Thu Dec 4 19:01:46 2025 +0100 Fix link to OpenEnv blog in docs (huggingface#4625) commit 495381d Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Thu Dec 4 11:32:34 2025 +0100 Fix README style (huggingface#4619) commit ddb65e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Dec 3 21:20:40 2025 +0100 Add experimental imports to docs (huggingface#4616) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 5fab472 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Dec 3 17:38:16 2025 +0100 Replace arXiv paper links with HF links (huggingface#4613) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit a3c1dfb Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Wed Dec 3 17:28:45 2025 +0100 Add ministral 3 free notebooks (huggingface#4614) commit 560fd3d Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Wed Dec 3 10:12:20 2025 +0000 [GRPOTrainer]: Add SAPO Loss (huggingface#4600) commit 814d4af Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Tue Dec 2 15:52:51 2025 -0700 Move MergeModelCallback to experimental (huggingface#4608) Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> commit 2a81076 Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Tue Dec 2 15:07:11 2025 +0100 Fixed OpenEnv example scripts (huggingface#4610) commit de343cd Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Dec 2 07:32:22 2025 +0100 Remove deprecations for 0.26 release (huggingface#4607) commit 07b4a84 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Dec 1 12:55:24 2025 -0700 Silence experimental warnings when imported in the stable (huggingface#4606) commit c55ef4b Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Dec 1 12:40:42 2025 -0700 Update How-to guides (huggingface#4604) commit c686d7d Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Dec 1 20:34:31 2025 +0100 Raise FutureWarning for classes moved to experimental (huggingface#4605) commit c7d172b Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Mon Dec 1 01:47:22 2025 -0800 docs: Expand speeding up training guide with acceleration methods (huggingface#4428) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit f1dfef0 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Mon Dec 1 01:39:08 2025 -0800 docs: Expand training customization examples (huggingface#4427) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> commit eb76389 Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sun Nov 30 16:45:21 2025 +0100 [GRPO] Sequence-level TIS & MIS (huggingface#4530) commit 0726977 Author: xuanduy04 <65279552+xuanduy04@users.noreply.github.com> Date: Fri Nov 28 23:56:22 2025 +0700 docs: Add Beyond the 80/20 Rule (2506.01939) to Paper Index (huggingface#4580) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 9731d08 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Nov 28 17:43:38 2025 +0100 Revert "Hotfix CI with dev dependencies: xfail test_prepare_inputs_for_generation" (huggingface#4587) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 84a0bbc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Nov 28 16:13:56 2025 +0100 Fix 'generation_config' AttributeError (huggingface#4596) commit f67c3f2 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Fri Nov 28 15:46:02 2025 +0100 Remove module-level imports of extra deps in experimental.judges (huggingface#4598) commit cb5fdf9 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Thu Nov 27 11:08:26 2025 +0100 Add missing require_bitsandbytes marker to CI tests (huggingface#4586) commit 4a3b584 Author: juejuezi <juejuezi.git@foxmail.com> Date: Thu Nov 27 00:11:56 2025 +0800 fix: use shift_labels for metrics when using CP or SP (huggingface#4579) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> commit d2e4315 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Nov 26 15:40:15 2025 +0100 Revert hotfix Fall back to config.text_config._name_or_path (huggingface#4581) commit 357e331 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 26 04:55:46 2025 -0700 Move tests for GSPOTokenTrainer to experimental (huggingface#4572) commit a59f2cf Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 26 04:50:44 2025 -0700 Move `WinRateCallback` to experimental (huggingface#4558) Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> commit cf431db Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Nov 26 04:11:04 2025 -0700 Fix PPO example (huggingface#4556) commit cac9f1d Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Tue Nov 25 21:27:58 2025 +0000 Fix Replay Buffer docs. (huggingface#4574)
Summary
This PR addresses issue #4379 by expanding the Training Customization documentation section with 5 new comprehensive examples, rather than removing it.
Resolves #4379
Changes Made
New Examples Added (5):
Documentation Improvements:
Statistics
Verification
✅ All imports verified against codebase
✅ All config options verified in DPOConfig
✅ DataCollatorForPreference import path corrected
✅ Consistent code style with existing examples
✅ Examples apply to most/all trainers as stated
Test Plan