llm-eval-scaffolder

Community

Automate LLM prompt evaluation.

Authorsaddam-eng-tech
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the process of evaluating LLM prompts, ensuring their quality and preventing regressions before they impact production.

Core Features & Use Cases

  • Structured Test Cases: Generates diverse test cases (golden, edge, adversarial, etc.) for comprehensive evaluation.
  • LLM-as-Judge Prompt: Creates a prompt for an LLM to act as a judge, scoring responses based on a defined rubric.
  • Automated Scoring: Includes a Python script to run evaluations and calculate pass/fail rates.
  • CI Integration: Sets up a GitHub Actions workflow to automatically run evaluations on code changes.
  • Use Case: When you update a prompt for a customer support chatbot, this skill ensures the new prompt doesn't degrade response quality on critical test cases.

Quick Start

Use the llm-eval-scaffolder skill to set up an LLM evaluation pipeline for my new summarization prompt.

Dependency Matrix

Required Modules

anthropic

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: llm-eval-scaffolder
Download link: https://github.com/saddam-eng-tech/ai-agent-skills/archive/main.zip#llm-eval-scaffolder

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.