Name: rl-evaluation
Availability: InStock
Author: tachyon-beep

System Documentation

What problem does it solve?

This skill provides a structured approach to evaluating RL agents with statistical rigor, ensuring results are reliable, reproducible, and suitable for publication or deployment.

Core Features & Use Cases

Multi-seed evaluation protocol with mean, std, and confidence intervals to quantify performance.
Generalization and distribution-shift testing to assess robustness beyond the training environment.
Clear reporting templates for papers and dashboards, including sample-efficiency and significance tests.

Quick Start

Use the rl-evaluation skill to set up a multi-seed evaluation for your agent, run 5-20 seeds, and generate a results summary.

Please help me install this Skill: Name: rl-evaluation Download link: https://github.com/tachyon-beep/hamlet/archive/main.zip#rl-evaluation Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

rl-evaluation

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper