langfuse-eval-infrastructure

Name: langfuse-eval-infrastructure
Availability: InStock
Author: mberto10

Community

Bootstrap your agent evaluation infrastructure.

Software Engineering #evaluation #infrastructure #prompts #agent optimization #langfuse #dataset

Authormberto10

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill automates the setup and maintenance of an evaluation infrastructure for agent optimization loops, ensuring a standardized and robust process for measuring agent performance.

Core Features & Use Cases

Define Eval Dimensions: Specify key metrics and thresholds for evaluating agent performance.
Manage Langfuse Integration: Store datasets, judge prompts, and baseline metrics in Langfuse for a single source of truth.
Generate Local Snapshots: Create local contract files (.json, .yaml) for the optimization loop to consume.
Bootstrap Modes: Supports both dataset-backed and live-trace evaluation setups.
Use Case: When starting a new agent development cycle, use this Skill to define accuracy and relevance dimensions, set up the necessary Langfuse prompts, and generate the evaluation contract file that the agent optimization loop will use to start its iterations.

Quick Start

Use the langfuse-eval-infrastructure skill to bootstrap the evaluation infrastructure for an agent named 'my-agent' using the dataset 'my-agent-eval'.

langfuse-eval-infrastructure

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper