calibrate

Community

Benchmark and calibrate AI agent performance.

AuthorBorda
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill rigorously tests AI agents and skills against synthetic problems to measure their performance, identify systematic gaps, and ensure their self-reported confidence aligns with actual accuracy.

Core Features & Use Cases

  • Performance Benchmarking: Quantifies recall, precision, and F1 scores for agents and skills.
  • Calibration Analysis: Detects over/under-confidence by comparing reported confidence with actual recall.
  • Gap Identification: Pinpoints recurring issues and anti-patterns in agent outputs.
  • Automated Improvement: Generates proposals to update agent instructions based on benchmark results.
  • Use Case: Run /calibrate sw-engineer full to test the software engineer agent on 10 synthetic coding problems, analyze its performance, and automatically generate updated instructions if needed.

Quick Start

Use the calibrate skill to benchmark all agents and skills with full problem sets and apply any necessary changes.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: calibrate
Download link: https://github.com/Borda/.home/archive/main.zip#calibrate

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.