rag-evaluation

Name: rag-evaluation
Availability: InStock
Author: latestaiagents

Official

Elevate RAG performance with metrics.

Software Engineering #testing #performance #llm #rag #metrics #evaluation

Authorlatestaiagents

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill addresses the critical need to rigorously evaluate and improve the performance of Retrieval Augmented Generation (RAG) systems by providing comprehensive metrics for retrieval, generation, and end-to-end quality.

Core Features & Use Cases

Retrieval Metrics: Calculate Mean Reciprocal Rank (MRR), Recall@k, Precision@k, NDCG@k, and Hit Rate to assess the quality of retrieved documents.
Generation Metrics: Utilize RAGAS or LLM-as-Judge to evaluate faithfulness, answer relevancy, context precision, context recall, and answer correctness.
End-to-End Testing: Simulate real-world usage to measure latency, cost, and overall system effectiveness.
Use Case: A RAG engineer needs to compare two different retrieval strategies. They use this skill to run both strategies against a golden dataset, generating detailed reports on which strategy yields better retrieval accuracy and more faithful, relevant answers.

Quick Start

Use the rag-evaluation skill to evaluate the RAG system's performance against the test dataset located at /path/to/test_dataset.json.

rag-evaluation

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper