eval-rag

Community

Evaluate RAG retrieval and generation precisely

Authorbreethomas
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Diagnose and quantify where a retrieval-augmented generation (RAG) pipeline fails by separating retrieval quality from generation faithfulness so teams can prioritize the highest-impact fixes and avoid optimizing the wrong component.

Core Features & Use Cases

  • Retrieval metrics: Compute Recall@k, Precision@k, MRR, and NDCG@k to measure whether the system finds the right document chunks for a query.
  • Generation evaluation: Assess faithfulness, omissions, misinterpretations, and relevance of model outputs given retrieved context.
  • Optimization guidance: Build retrieval evaluation datasets, tune chunking and overlap, run grid searches, diagnose multi-hop failures, and produce prioritized engineering recommendations.
  • Use Case: A PM wants to know whether customer-support answers are failing because the vector store misses FAQ paragraphs or because the LLM hallucinates; this skill yields metrics, diagnostic tables, and next-step fixes.

Quick Start

Evaluate the RAG pipeline for the customer knowledge base and return retrieval metrics (Recall@k, Precision@k, MRR, NDCG@k), a faithfulness/relevance summary of generation failures, and recommended fixes.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: eval-rag
Download link: https://github.com/breethomas/bette-think/archive/main.zip#eval-rag

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.