eval-recipes Runner Skill
CommunityBenchmark amplihack improvements with eval-recipes.
Data & Analytics#automation#benchmark#PR testing#eval-recipes#machine-learning evaluation#amplihack#scores
Authorrysweet
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates benchmarking and validation of amplihack improvements against baseline agents using the eval-recipes suite.
Core Features & Use Cases
- Benchmarking: Run standardized eval-recipes benchmarks to evaluate agent performance.
- PR Validation: Compare scores before/after changes to verify improvements.
- Automated Reporting: Generate comparative reports showing score improvements across tasks.
Quick Start
Clone eval-recipes, copy our agent configs, install uv, and run a benchmark with a specified task and trials, then compare results to the baseline.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: eval-recipes Runner Skill Download link: https://github.com/rysweet/AzureHayMaker/archive/main.zip#eval-recipes-runner-skill Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.