deepeval

Community

Evaluate LLM applications with precision and scale.

Authorsammcj
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides comprehensive evaluation capabilities for LLM applications, ensuring reliability and performance across RAG systems, conversational AI, and agent workflows.

Core Features & Use Cases

  • 50+ Evaluation Metrics: Covering RAG pipelines, conversational AI, agents, safety, and custom criteria.
  • Component-Level Tracing: Use the @observe decorator to evaluate individual components of your LLM system.
  • Use Case: Imagine you've built a customer support chatbot. Use this Skill to automatically test its response quality, role adherence, and safety across diverse customer scenarios.

Quick Start

Use the deepeval skill to evaluate the response quality of your customer support chatbot against common queries and edge cases.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: deepeval
Download link: https://github.com/sammcj/agentic-coding/archive/main.zip#deepeval

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository