Name: llm-as-a-judge
Availability: InStock
Author: maragudk

System Documentation

What problem does it solve?

This Skill automates the quality assessment of LLM pipeline outputs by using another LLM as a judge, enabling objective evaluation of nuanced or subjective failure modes.

Core Features & Use Cases

Automated Evaluator Creation: Design and deploy LLM-as-Judge evaluators for binary (Pass/Fail) assessments.
Iterative Prompt Refinement: Improve judge accuracy by measuring alignment with human labels (TPR/TNR) and refining prompts.
Success Rate Estimation: Calculate true success rates with bias correction for production data.
Use Case: You need to automatically check if customer support responses are empathetic and helpful. This Skill allows you to build a judge prompt that evaluates this nuanced criterion, moving beyond simple keyword matching.

Quick Start

Use the llm-as-a-judge skill to create a judge prompt for evaluating response helpfulness.

Please help me install this Skill: Name: llm-as-a-judge Download link: https://github.com/maragudk/evals-skills/archive/main.zip#llm-as-a-judge Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

llm-as-a-judge

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper