Name: llm-eval-scaffolder
Availability: InStock
Author: saddam-eng-tech

System Documentation

What problem does it solve?

This Skill automates the process of evaluating LLM prompts, ensuring their quality and preventing regressions before they impact production.

Core Features & Use Cases

Structured Test Cases: Generates diverse test cases (golden, edge, adversarial, etc.) for comprehensive evaluation.
LLM-as-Judge Prompt: Creates a prompt for an LLM to act as a judge, scoring responses based on a defined rubric.
Automated Scoring: Includes a Python script to run evaluations and calculate pass/fail rates.
CI Integration: Sets up a GitHub Actions workflow to automatically run evaluations on code changes.
Use Case: When you update a prompt for a customer support chatbot, this skill ensures the new prompt doesn't degrade response quality on critical test cases.

Quick Start

Use the llm-eval-scaffolder skill to set up an LLM evaluation pipeline for my new summarization prompt.

Please help me install this Skill: Name: llm-eval-scaffolder Download link: https://github.com/saddam-eng-tech/ai-agent-skills/archive/main.zip#llm-eval-scaffolder Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

llm-eval-scaffolder

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper