deepeval
CommunityEvaluate LLM applications with precision and scale.
Software Engineering#llm evaluation#conversational ai#pytest integration#ai safety#async performance#rag testing
Authorsammcj
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides comprehensive evaluation capabilities for LLM applications, ensuring reliability and performance across RAG systems, conversational AI, and agent workflows.
Core Features & Use Cases
- 50+ Evaluation Metrics: Covering RAG pipelines, conversational AI, agents, safety, and custom criteria.
- Component-Level Tracing: Use the @observe decorator to evaluate individual components of your LLM system.
- Use Case: Imagine you've built a customer support chatbot. Use this Skill to automatically test its response quality, role adherence, and safety across diverse customer scenarios.
Quick Start
Use the deepeval skill to evaluate the response quality of your customer support chatbot against common queries and edge cases.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: deepeval Download link: https://github.com/sammcj/agentic-coding/archive/main.zip#deepeval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.