challenge-run

Community

Execute and evaluate agent challenges.

Authortyevans
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the execution and rigorous evaluation of agent challenges, providing objective performance metrics and insights into agent capabilities.

Core Features & Use Cases

  • Automated Challenge Execution: Dispatches agents against predefined challenge sets to test their performance.
  • Performance Evaluation: Assesses agent output against acceptance criteria, hidden traps, and ground truth.
  • Calibration & Calibration: Measures how well an agent's self-reported confidence aligns with its actual performance.
  • Use Case: After generating a set of challenges for your billing-agent using /challenge-gen, you can run /challenge-run billing-agent to see how effectively it handles those specific scenarios and identify areas for improvement.

Quick Start

Use the challenge-run skill to execute challenges for the agent named 'billing-agent'.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: challenge-run
Download link: https://github.com/tyevans/tackline/archive/main.zip#challenge-run

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.