skill-ab-eval
CommunityBenchmark skill changes with A/B tests.
Authorvltansky
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps you confidently make changes to your AI agent's skills by providing a rigorous way to measure whether those changes actually improve performance, preventing regressions and ensuring progress.
Core Features & Use Cases
- Controlled A/B Testing: Compares a modified skill against its baseline version using automated test cases.
- Performance Benchmarking: Generates detailed reports showing pass rates, deltas, and an overall verdict (improvement, regression, no change, or mixed).
- Use Case: After refactoring your
roast-my-agents-mdskill, use this Skill to run it against a set of predefined prompts and assertions. The report will tell you if your changes made it better, worse, or had no effect compared to the original version.
Quick Start
Use the skill-ab-eval skill to test your recent changes to the roast-my-agents-md skill.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: skill-ab-eval Download link: https://github.com/vltansky/skills/archive/main.zip#skill-ab-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.