Skill Explorer

Searching protocol for "preference optimization"

dpo

Community

Optimize preferences with implicit reward learning.

Advanced

byatrawog

fine-tuning-with-trl

Community

Align LLMs with human preferences using RL.

Advanced

byan8079

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

byMesferAli

simpo-training

Community

Optimize LLMs with SimPO, no reference needed.

Advanced

bygagan114662

unsloth-dpo

Community

Efficient DPO for model alignment

Few Config

bycuba6112

fine-tuning-with-trl

Community

Align LLMs with human preferences via RL.

Advanced

byDoanNgocCuong

simpo-training

Community

Reference-free preference optimization for LLM alignment.

Advanced

byovachiever

fine-tuning-with-trl

Community

Align LLMs with human preferences via RL.

Advanced

byinformatico-madrid

model_finetuning

Community

Align LLMs with human preferences.

Advanced

byvuralserhat86

simpo-training

Community

Optimize LLMs without a reference model.

Advanced

byMesferAli

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

byihatesea69

fine-tuning-with-trl

Community

Align LLMs with human preferences via RL.

Advanced

bytianhao909