kto

Here are 4 public repositories matching this topic...

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Pipelines for Fine-Tuning LLMs using SFT and RLHF

lora fine-tuning ppo peft sft dpo kto p-tuning qlora orpo grpo

RLHF experiments with Unsloth: DPO, ORPO, SimPO, KTO. Training scripts, notebooks, and quick evaluation utilities.

lora dpo huggingface kto trl rlhf qlora unsloth orpo simpo

🚀 Optimize preferences effectively with ORPO, a framework for monolithic preference optimization without a reference model.

data reinforcement-learning medical human-pose-estimation gpt lora privacy-preserving ppo dpo huggingface kto low-resolution-images model-averaging llm generative-ai rlhf qwen medicalgpt

Add a description, image, and links to the kto topic page so that developers can more easily learn about it.

To associate your repository with the kto topic, visit your repo's landing page and select "manage topics."