In Trainer, when remove_unused_column=True, the trainer will check the signature of the model and remove the columns of the dataset which don't match the signature.
In most trl trainers, we don't directly feed the model with the sampled data, see for example in DPO:
https://github.com/huggingface/trl/blob/7a530ba6d2ebf4101aaeb8002cd51c9bde4fb721/trl/trainer/dpo_trainer.py#L1081-L1110
or in GRPO:
https://github.com/huggingface/trl/blob/7a530ba6d2ebf4101aaeb8002cd51c9bde4fb721/trl/trainer/grpo_trainer.py#L1919-L1925
consequently, to still support remove_unused_column, we need to hack the signature column, here:
https://github.com/huggingface/trl/blob/7a530ba6d2ebf4101aaeb8002cd51c9bde4fb721/trl/trainer/dpo_trainer.py#L866-L880
it works perfectly fine, but we could want to find a better way than this to improve customizability.
In Trainer, when
remove_unused_column=True, the trainer will check the signature of the model and remove the columns of the dataset which don't match the signature.In most trl trainers, we don't directly feed the model with the sampled data, see for example in DPO:
https://github.com/huggingface/trl/blob/7a530ba6d2ebf4101aaeb8002cd51c9bde4fb721/trl/trainer/dpo_trainer.py#L1081-L1110
or in GRPO:
https://github.com/huggingface/trl/blob/7a530ba6d2ebf4101aaeb8002cd51c9bde4fb721/trl/trainer/grpo_trainer.py#L1919-L1925
consequently, to still support
remove_unused_column, we need to hack the signature column, here:https://github.com/huggingface/trl/blob/7a530ba6d2ebf4101aaeb8002cd51c9bde4fb721/trl/trainer/dpo_trainer.py#L866-L880
it works perfectly fine, but we could want to find a better way than this to improve customizability.