Name: eval-audit
Availability: InStock
Author: hamelsmu

System Documentation

What problem does it solve?

This Skill identifies critical flaws in your LLM evaluation pipelines, ensuring your metrics are trustworthy and your AI product is genuinely improving.

Core Features & Use Cases

Diagnostic Checks: Assesses six key areas: Error Analysis, Evaluator Design, Judge Validation, Human Review Process, Labeled Data, and Pipeline Hygiene.
Prioritized Findings: Delivers a report of problems ordered by their impact on your evaluation's reliability.
Actionable Next Steps: Provides concrete recommendations, often suggesting other skills to fix identified issues.
Use Case: You've inherited an LLM evaluation system and are unsure if its results are reliable. Run eval-audit to get a clear picture of potential issues and a roadmap for improvement.

Quick Start

Use the eval-audit skill to audit my current LLM evaluation pipeline.

Please help me install this Skill: Name: eval-audit Download link: https://github.com/hamelsmu/evals-skills/archive/main.zip#eval-audit Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

eval-audit

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper