simpo-training

Community

Reference-free preference optimization for LLM alignment.

Authorovachiever
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill describes SimPO, a reference-free preference optimization method to align LLMs, offering stronger performance than DPO while eliminating the need for a reference model.

Core Features & Use Cases

  • Reference-free alignment: Train preference models without a reference policy.
  • Efficient fine-tuning: Achieve competitive alignment with fewer resources than PPO/DPO.
  • Versatile workflows: Used for both baseline LLMs and instruct/chat models with configurable loss and margins.

Quick Start

Launch a SimPO training job for a base model (e.g., Mistral-7B) with a small preference dataset and monitor reward margins.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: simpo-training
Download link: https://github.com/ovachiever/droid-tings/archive/main.zip#simpo-training

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository