Name: skill-ab-eval
Availability: InStock
Author: vltansky

System Documentation

What problem does it solve?

This Skill helps you confidently make changes to your AI agent's skills by providing a rigorous way to measure whether those changes actually improve performance, preventing regressions and ensuring progress.

Core Features & Use Cases

Controlled A/B Testing: Compares a modified skill against its baseline version using automated test cases.
Performance Benchmarking: Generates detailed reports showing pass rates, deltas, and an overall verdict (improvement, regression, no change, or mixed).
Use Case: After refactoring your roast-my-agents-md skill, use this Skill to run it against a set of predefined prompts and assertions. The report will tell you if your changes made it better, worse, or had no effect compared to the original version.

Quick Start

Use the skill-ab-eval skill to test your recent changes to the roast-my-agents-md skill.

Please help me install this Skill: Name: skill-ab-eval Download link: https://github.com/vltansky/skills/archive/main.zip#skill-ab-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

skill-ab-eval

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper