Name: build-eval
Availability: InStock
Author: yzavyas

System Documentation

What problem does it solve?

This skill provides a structured framework to design, validate, and compare evaluations for LLM agents, multi-agent systems, skills, MCP servers, and prompts, reducing ambiguity in measurement and alignment.

Core Features & Use Cases

Support for end-to-end evaluation metrics (TaskCompletion, ToolCorrectness, pass@k, iterative metrics)
Frameworks integration: DeepEval, Braintrust, RAGAS, Promptfoo for flexible scoring
Use cases include: validating agent coordination, benchmarking MCP server reliability, and developing reusable eval templates.

Quick Start

Run a baseline evaluation harness against your codebase by assembling a dataset of test cases, choosing a framework, and executing the evaluator. For example: create a dataset with agent tasks, select DeepEval metrics, and run your evaluation harness.

Please help me install this Skill: Name: build-eval Download link: https://github.com/yzavyas/claude-1337/archive/main.zip#build-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

build-eval

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper