Name: eval-recipes Runner Skill
Availability: InStock
Author: rysweet

System Documentation

What problem does it solve?

This Skill automates benchmarking and validation of amplihack improvements against baseline agents using the eval-recipes suite.

Core Features & Use Cases

Benchmarking: Run standardized eval-recipes benchmarks to evaluate agent performance.
PR Validation: Compare scores before/after changes to verify improvements.
Automated Reporting: Generate comparative reports showing score improvements across tasks.

Quick Start

Clone eval-recipes, copy our agent configs, install uv, and run a benchmark with a specified task and trials, then compare results to the baseline.

Please help me install this Skill: Name: eval-recipes Runner Skill Download link: https://github.com/rysweet/AzureHayMaker/archive/main.zip#eval-recipes-runner-skill Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

eval-recipes Runner Skill

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper