agent-evaluation
CommunityEvaluate and improve LLM agents.
AuthorAradhya0510
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides a systematic framework for evaluating and improving the output quality of Large Language Model (LLM) agents, addressing issues like incorrect tool selection, poor answer quality, high costs, and inaccurate responses.
Core Features & Use Cases
- End-to-End Evaluation: Covers the complete workflow from tracing setup to evaluation execution.
- MLflow Integration: Leverages MLflow's native APIs for datasets, scorers, and evaluation for robust tracking and observability.
- Systematic Improvement: Helps optimize tool selection accuracy, reduce costs, and fix agent errors.
- Use Case: You have an LLM agent that is supposed to book flights but sometimes suggests incorrect dates or uses the wrong API. This skill helps you evaluate its performance against a set of test cases, identify the root cause of errors, and implement improvements.
Quick Start
Use the agent-evaluation skill to evaluate the agent's output quality using MLflow.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: agent-evaluation Download link: https://github.com/Aradhya0510/databricks-cv-accelerator/archive/main.zip#agent-evaluation Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.