simpo-training
CommunityReference-free preference optimization for LLM alignment.
Authorovachiever
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill describes SimPO, a reference-free preference optimization method to align LLMs, offering stronger performance than DPO while eliminating the need for a reference model.
Core Features & Use Cases
- Reference-free alignment: Train preference models without a reference policy.
- Efficient fine-tuning: Achieve competitive alignment with fewer resources than PPO/DPO.
- Versatile workflows: Used for both baseline LLMs and instruct/chat models with configurable loss and margins.
Quick Start
Launch a SimPO training job for a base model (e.g., Mistral-7B) with a small preference dataset and monitor reward margins.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: simpo-training Download link: https://github.com/ovachiever/droid-tings/archive/main.zip#simpo-training Please download this .zip file, extract it, and install it in the .claude/skills/ directory.