Name: rloo
Availability: InStock
Author: atrawog

System Documentation

What problem does it solve?

Reinforcement learning model training often suffers from high gradient variance, especially in policy optimization with sparse or delayed rewards. RLOO uses leave-one-out baselines to stabilize training and improve sample efficiency.

Core Features & Use Cases

RLOOTrainer and RLOOConfig for variance-reduced RLHF training
Reward function integration using completion_ids for efficient token-based rewards
Thinking-aware patterns and stable policy optimization for reasoning tasks

Quick Start

Run a small RLOO training session with a short dataset using RLOOTrainer and the default RLOOConfig

Please help me install this Skill: Name: rloo Download link: https://github.com/atrawog/overthink-plugins/archive/main.zip#rloo Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

rloo

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper