rag-evaluation

Official

Elevate RAG performance with metrics.

Authorlatestaiagents
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the critical need to rigorously evaluate and improve the performance of Retrieval Augmented Generation (RAG) systems by providing comprehensive metrics for retrieval, generation, and end-to-end quality.

Core Features & Use Cases

  • Retrieval Metrics: Calculate Mean Reciprocal Rank (MRR), Recall@k, Precision@k, NDCG@k, and Hit Rate to assess the quality of retrieved documents.
  • Generation Metrics: Utilize RAGAS or LLM-as-Judge to evaluate faithfulness, answer relevancy, context precision, context recall, and answer correctness.
  • End-to-End Testing: Simulate real-world usage to measure latency, cost, and overall system effectiveness.
  • Use Case: A RAG engineer needs to compare two different retrieval strategies. They use this skill to run both strategies against a golden dataset, generating detailed reports on which strategy yields better retrieval accuracy and more faithful, relevant answers.

Quick Start

Use the rag-evaluation skill to evaluate the RAG system's performance against the test dataset located at /path/to/test_dataset.json.

Dependency Matrix

Required Modules

ragaslangchain-openaidatasetsnumpyjson

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: rag-evaluation
Download link: https://github.com/latestaiagents/agent-skills/archive/main.zip#rag-evaluation

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.