Name: challenge-run
Availability: InStock
Author: tyevans

System Documentation

What problem does it solve?

This Skill automates the execution and rigorous evaluation of agent challenges, providing objective performance metrics and insights into agent capabilities.

Core Features & Use Cases

Automated Challenge Execution: Dispatches agents against predefined challenge sets to test their performance.
Performance Evaluation: Assesses agent output against acceptance criteria, hidden traps, and ground truth.
Calibration & Calibration: Measures how well an agent's self-reported confidence aligns with its actual performance.
Use Case: After generating a set of challenges for your billing-agent using /challenge-gen, you can run /challenge-run billing-agent to see how effectively it handles those specific scenarios and identify areas for improvement.

Quick Start

Use the challenge-run skill to execute challenges for the agent named 'billing-agent'.

Please help me install this Skill: Name: challenge-run Download link: https://github.com/tyevans/tackline/archive/main.zip#challenge-run Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

challenge-run

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper