ai-evaluation-suite
CommunityEnsure AI quality, performance, and safety.
Authordoctorduke
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Building reliable AI systems is hard. This Skill provides a comprehensive toolkit to rigorously evaluate LLM outputs, RAG systems, and AI agents, ensuring your AI performs as expected, avoids hallucinations, and is free from bias, saving you from costly production failures.
Core Features & Use Cases
- LLM Quality Assessment: Use LLM-as-judge to score outputs on coherence, relevance, factuality, and more.
- Hallucination & Bias Detection: Automatically identify factual inconsistencies and demographic biases in AI-generated content.
- Cost & Performance Optimization: Track token usage, latency, and cost to optimize your AI's efficiency.
- Use Case: You're deploying a new summarization model. Use this Skill to automatically evaluate its output quality against a test set, detect any hallucinations, and compare its cost-effectiveness against a cheaper model before going live.
Quick Start
Evaluate the quality of the LLM's response "Quantum entanglement is a phenomenon..." to the query "Explain quantum entanglement" using the LLMQualityEvaluator.
Dependency Matrix
Required Modules
anthropicnumpypytest
Components
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: ai-evaluation-suite Download link: https://github.com/doctorduke/claude-config/archive/main.zip#ai-evaluation-suite Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.