Name: benchmark-driven-improvement
Availability: InStock
Author: prime-radiant-inc

System Documentation

What problem does it solve?

Diagnoses and iterates on Serf benchmark failures to improve autonomous agent reliability.

Core Features & Use Cases

Provides a structured workflow for extracting, reproducing, and fixing benchmark failures.
Facilitates transcript analysis, session interrogation, and iterative code/prompt improvements to generalize fixes across tasks.
Supports local execution, tool usage assessment, and verification through standard evaluation tooling.

Quick Start

Run a local benchmark investigation workflow against a cached task, then review transcripts and iterate on prompts, tools, and code to improve robustness.

Please help me install this Skill: Name: benchmark-driven-improvement Download link: https://github.com/prime-radiant-inc/serf/archive/main.zip#benchmark-driven-improvement Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

benchmark-driven-improvement

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper