eval-driven-dev

Community

Automate LLM app QA and iteration.

Authoryiouli
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill streamlines the entire quality assurance process for LLM-powered applications, from initial setup to iterative debugging and performance improvement.

Core Features & Use Cases

  • Automated Instrumentation: Easily add tracing to your Python LLM applications.
  • Dataset Generation: Build golden datasets from real application runs.
  • Eval Test Creation: Automatically generate tests using various evaluators.
  • Full QA Loop: Instrument, build datasets, write tests, run evals, and investigate failures.
  • Use Case: When developing a new AI agent, use this Skill to set up a robust evaluation pipeline that catches regressions and ensures output quality before shipping.

Quick Start

Use the eval-driven-dev skill to set up QA for my Python AI project.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: eval-driven-dev
Download link: https://github.com/yiouli/pixie-qa/archive/main.zip#eval-driven-dev

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.