Name: agent-evaluation
Availability: InStock
Author: Aradhya0510

System Documentation

What problem does it solve?

This Skill provides a systematic framework for evaluating and improving the output quality of Large Language Model (LLM) agents, addressing issues like incorrect tool selection, poor answer quality, high costs, and inaccurate responses.

Core Features & Use Cases

End-to-End Evaluation: Covers the complete workflow from tracing setup to evaluation execution.
MLflow Integration: Leverages MLflow's native APIs for datasets, scorers, and evaluation for robust tracking and observability.
Systematic Improvement: Helps optimize tool selection accuracy, reduce costs, and fix agent errors.
Use Case: You have an LLM agent that is supposed to book flights but sometimes suggests incorrect dates or uses the wrong API. This skill helps you evaluate its performance against a set of test cases, identify the root cause of errors, and implement improvements.

Quick Start

Use the agent-evaluation skill to evaluate the agent's output quality using MLflow.

Please help me install this Skill: Name: agent-evaluation Download link: https://github.com/Aradhya0510/databricks-cv-accelerator/archive/main.zip#agent-evaluation Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

agent-evaluation

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper