team-shinchan:eval
CommunityDetect agent regressions and track evaluations.
Authorseokan-jeong
Version1.0.0
Installs0
System Documentation
What problem does it solve?
The Eval Skill surfaces performance regressions and summarizes historical evaluation data for specialized agents so teams can quickly detect declining behavior and take corrective action. It removes the manual effort of scanning logs and comparing metric trends across multiple agents and evaluation dimensions.
Core Features & Use Cases
- Aggregate summaries: Produce an at-a-glance table of evaluations per agent with scores for correctness, efficiency, compliance, and quality.
- Per-agent history and trends: Show full evaluation history for a single agent with trend indicators and latest reviewer notes to aid diagnosis.
- Regression detection & filtering: Identify agents and specific dimensions with statistically significant drops using a moving average window and flag regressions for review.
- Side-by-side comparisons: Compare recent evaluations across agents to prioritize remediation and track improvements after prompt or policy changes.
- Use Case: After a prompt update or model change, run the tool to quickly find which agents regressed and which metrics dropped, then assign review tasks.
Quick Start
Run the regression detector against the repository's eval history to get a summary, per-agent details, and regression alerts.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: team-shinchan:eval Download link: https://github.com/seokan-jeong/team-shinchan/archive/main.zip#team-shinchan-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.