challenge-run
CommunityExecute and evaluate agent challenges.
Software Engineering#agent development#calibration#performance testing#active learning#agent evaluation#challenge execution
Authortyevans
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates the execution and rigorous evaluation of agent challenges, providing objective performance metrics and insights into agent capabilities.
Core Features & Use Cases
- Automated Challenge Execution: Dispatches agents against predefined challenge sets to test their performance.
- Performance Evaluation: Assesses agent output against acceptance criteria, hidden traps, and ground truth.
- Calibration & Calibration: Measures how well an agent's self-reported confidence aligns with its actual performance.
- Use Case: After generating a set of challenges for your
billing-agentusing/challenge-gen, you can run/challenge-run billing-agentto see how effectively it handles those specific scenarios and identify areas for improvement.
Quick Start
Use the challenge-run skill to execute challenges for the agent named 'billing-agent'.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: challenge-run Download link: https://github.com/tyevans/tackline/archive/main.zip#challenge-run Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.