llm-as-a-judge

Official

Build LLM evaluators for quality assessment.

Authormaragudk
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the quality assessment of LLM pipeline outputs by using another LLM as a judge, enabling objective evaluation of nuanced or subjective failure modes.

Core Features & Use Cases

  • Automated Evaluator Creation: Design and deploy LLM-as-Judge evaluators for binary (Pass/Fail) assessments.
  • Iterative Prompt Refinement: Improve judge accuracy by measuring alignment with human labels (TPR/TNR) and refining prompts.
  • Success Rate Estimation: Calculate true success rates with bias correction for production data.
  • Use Case: You need to automatically check if customer support responses are empathetic and helpful. This Skill allows you to build a judge prompt that evaluates this nuanced criterion, moving beyond simple keyword matching.

Quick Start

Use the llm-as-a-judge skill to create a judge prompt for evaluating response helpfulness.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: llm-as-a-judge
Download link: https://github.com/maragudk/evals-skills/archive/main.zip#llm-as-a-judge

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.