rl-evaluation

Community

Rigorous RL evaluation for statistical validity.

Authortachyon-beep
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill provides a structured approach to evaluating RL agents with statistical rigor, ensuring results are reliable, reproducible, and suitable for publication or deployment.

Core Features & Use Cases

  • Multi-seed evaluation protocol with mean, std, and confidence intervals to quantify performance.
  • Generalization and distribution-shift testing to assess robustness beyond the training environment.
  • Clear reporting templates for papers and dashboards, including sample-efficiency and significance tests.

Quick Start

Use the rl-evaluation skill to set up a multi-seed evaluation for your agent, run 5-20 seeds, and generate a results summary.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: rl-evaluation
Download link: https://github.com/tachyon-beep/hamlet/archive/main.zip#rl-evaluation

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.