team-shinchan:eval

Community

Detect agent regressions and track evaluations.

Authorseokan-jeong
Version1.0.0
Installs0

System Documentation

What problem does it solve?

The Eval Skill surfaces performance regressions and summarizes historical evaluation data for specialized agents so teams can quickly detect declining behavior and take corrective action. It removes the manual effort of scanning logs and comparing metric trends across multiple agents and evaluation dimensions.

Core Features & Use Cases

  • Aggregate summaries: Produce an at-a-glance table of evaluations per agent with scores for correctness, efficiency, compliance, and quality.
  • Per-agent history and trends: Show full evaluation history for a single agent with trend indicators and latest reviewer notes to aid diagnosis.
  • Regression detection & filtering: Identify agents and specific dimensions with statistically significant drops using a moving average window and flag regressions for review.
  • Side-by-side comparisons: Compare recent evaluations across agents to prioritize remediation and track improvements after prompt or policy changes.
  • Use Case: After a prompt update or model change, run the tool to quickly find which agents regressed and which metrics dropped, then assign review tasks.

Quick Start

Run the regression detector against the repository's eval history to get a summary, per-agent details, and regression alerts.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: team-shinchan:eval
Download link: https://github.com/seokan-jeong/team-shinchan/archive/main.zip#team-shinchan-eval

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.