llm-as-a-judge
OfficialBuild LLM evaluators for quality assessment.
Authormaragudk
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates the quality assessment of LLM pipeline outputs by using another LLM as a judge, enabling objective evaluation of nuanced or subjective failure modes.
Core Features & Use Cases
- Automated Evaluator Creation: Design and deploy LLM-as-Judge evaluators for binary (Pass/Fail) assessments.
- Iterative Prompt Refinement: Improve judge accuracy by measuring alignment with human labels (TPR/TNR) and refining prompts.
- Success Rate Estimation: Calculate true success rates with bias correction for production data.
- Use Case: You need to automatically check if customer support responses are empathetic and helpful. This Skill allows you to build a judge prompt that evaluates this nuanced criterion, moving beyond simple keyword matching.
Quick Start
Use the llm-as-a-judge skill to create a judge prompt for evaluating response helpfulness.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: llm-as-a-judge Download link: https://github.com/maragudk/evals-skills/archive/main.zip#llm-as-a-judge Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.