agent-evaluation

Community

Evaluate and improve LLM agents.

AuthorAradhya0510
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides a systematic framework for evaluating and improving the output quality of Large Language Model (LLM) agents, addressing issues like incorrect tool selection, poor answer quality, high costs, and inaccurate responses.

Core Features & Use Cases

  • End-to-End Evaluation: Covers the complete workflow from tracing setup to evaluation execution.
  • MLflow Integration: Leverages MLflow's native APIs for datasets, scorers, and evaluation for robust tracking and observability.
  • Systematic Improvement: Helps optimize tool selection accuracy, reduce costs, and fix agent errors.
  • Use Case: You have an LLM agent that is supposed to book flights but sometimes suggests incorrect dates or uses the wrong API. This skill helps you evaluate its performance against a set of test cases, identify the root cause of errors, and implement improvements.

Quick Start

Use the agent-evaluation skill to evaluate the agent's output quality using MLflow.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: agent-evaluation
Download link: https://github.com/Aradhya0510/databricks-cv-accelerator/archive/main.zip#agent-evaluation

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.