calibrate
CommunityBenchmark and calibrate AI agent performance.
Software Engineering#benchmarking#calibration#performance analysis#agent testing#skill evaluation#instruction tuning
AuthorBorda
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill rigorously tests AI agents and skills against synthetic problems to measure their performance, identify systematic gaps, and ensure their self-reported confidence aligns with actual accuracy.
Core Features & Use Cases
- Performance Benchmarking: Quantifies recall, precision, and F1 scores for agents and skills.
- Calibration Analysis: Detects over/under-confidence by comparing reported confidence with actual recall.
- Gap Identification: Pinpoints recurring issues and anti-patterns in agent outputs.
- Automated Improvement: Generates proposals to update agent instructions based on benchmark results.
- Use Case: Run
/calibrate sw-engineer fullto test the software engineer agent on 10 synthetic coding problems, analyze its performance, and automatically generate updated instructions if needed.
Quick Start
Use the calibrate skill to benchmark all agents and skills with full problem sets and apply any necessary changes.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: calibrate Download link: https://github.com/Borda/.home/archive/main.zip#calibrate Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.