[Experimental] Add SDFT trainer, config, docs, and tests#4941
[Experimental] Add SDFT trainer, config, docs, and tests#4941Shekswess wants to merge 9 commits intohuggingface:mainfrom
Conversation
|
@qgallouedec maybe this is not the perfect implementation but overall I think it's okay. I followed the original code from the authors of the papers and that was kinda messy hahahahaha Please any comments on how we can make this in the best shape possible, improvements, coverage, etc... feel free to drop it and I can help you on this one. This is my first PR like this so I'm really excited. ❤️ P.S I want this trainer to be added as experimental trainer because I want to do active research on self-distillation methods on tiny language models I see it as a possibility to make them even more powerful |
|
Hello @Shekswess, one of the authors here 👋 Currently, this implementation is for offline training (ie training on a fixed dataset of teacher prompts). I was wondering whether we could easily extend this implementation to online training too? Do you think it would make sense to integrate these into one implementation of self-distillation? |
|
Heyoooo @jonhue ! |
|
Amazing! Happy to help! |
|
Hey, sorry for the late review, this one is quite big! Thank you! At this point, I have a few remarks:
|
this makes sense if its meant to serve as a reference implementation / reproduce results. that said, it would be easy to just default to self-distillation when |
What does this PR do?
Adds an experimental Self‑Distillation Fine‑Tuning (SDFT) trainer to TRL, including:
SDFTTrainer+SDFTConfigundertrl.experimental.sdftprompt/teacher_promptFixes #4940
Before submitting
Who can review?
@qgallouedec