eval-rag
CommunityEvaluate RAG retrieval and generation precisely
Authorbreethomas
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Diagnose and quantify where a retrieval-augmented generation (RAG) pipeline fails by separating retrieval quality from generation faithfulness so teams can prioritize the highest-impact fixes and avoid optimizing the wrong component.
Core Features & Use Cases
- Retrieval metrics: Compute Recall@k, Precision@k, MRR, and NDCG@k to measure whether the system finds the right document chunks for a query.
- Generation evaluation: Assess faithfulness, omissions, misinterpretations, and relevance of model outputs given retrieved context.
- Optimization guidance: Build retrieval evaluation datasets, tune chunking and overlap, run grid searches, diagnose multi-hop failures, and produce prioritized engineering recommendations.
- Use Case: A PM wants to know whether customer-support answers are failing because the vector store misses FAQ paragraphs or because the LLM hallucinates; this skill yields metrics, diagnostic tables, and next-step fixes.
Quick Start
Evaluate the RAG pipeline for the customer knowledge base and return retrieval metrics (Recall@k, Precision@k, MRR, NDCG@k), a faithfulness/relevance summary of generation failures, and recommended fixes.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: eval-rag Download link: https://github.com/breethomas/bette-think/archive/main.zip#eval-rag Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.