rag-evaluation
OfficialElevate RAG performance with metrics.
Authorlatestaiagents
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the critical need to rigorously evaluate and improve the performance of Retrieval Augmented Generation (RAG) systems by providing comprehensive metrics for retrieval, generation, and end-to-end quality.
Core Features & Use Cases
- Retrieval Metrics: Calculate Mean Reciprocal Rank (MRR), Recall@k, Precision@k, NDCG@k, and Hit Rate to assess the quality of retrieved documents.
- Generation Metrics: Utilize RAGAS or LLM-as-Judge to evaluate faithfulness, answer relevancy, context precision, context recall, and answer correctness.
- End-to-End Testing: Simulate real-world usage to measure latency, cost, and overall system effectiveness.
- Use Case: A RAG engineer needs to compare two different retrieval strategies. They use this skill to run both strategies against a golden dataset, generating detailed reports on which strategy yields better retrieval accuracy and more faithful, relevant answers.
Quick Start
Use the rag-evaluation skill to evaluate the RAG system's performance against the test dataset located at /path/to/test_dataset.json.
Dependency Matrix
Required Modules
ragaslangchain-openaidatasetsnumpyjson
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: rag-evaluation Download link: https://github.com/latestaiagents/agent-skills/archive/main.zip#rag-evaluation Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.