skill-ab-eval

Community

Benchmark skill changes with A/B tests.

Authorvltansky
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps you confidently make changes to your AI agent's skills by providing a rigorous way to measure whether those changes actually improve performance, preventing regressions and ensuring progress.

Core Features & Use Cases

  • Controlled A/B Testing: Compares a modified skill against its baseline version using automated test cases.
  • Performance Benchmarking: Generates detailed reports showing pass rates, deltas, and an overall verdict (improvement, regression, no change, or mixed).
  • Use Case: After refactoring your roast-my-agents-md skill, use this Skill to run it against a set of predefined prompts and assertions. The report will tell you if your changes made it better, worse, or had no effect compared to the original version.

Quick Start

Use the skill-ab-eval skill to test your recent changes to the roast-my-agents-md skill.

Dependency Matrix

Required Modules

None required

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: skill-ab-eval
Download link: https://github.com/vltansky/skills/archive/main.zip#skill-ab-eval

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.