langfuse-eval-infrastructure

Community

Bootstrap your agent evaluation infrastructure.

Authormberto10
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the setup and maintenance of an evaluation infrastructure for agent optimization loops, ensuring a standardized and robust process for measuring agent performance.

Core Features & Use Cases

  • Define Eval Dimensions: Specify key metrics and thresholds for evaluating agent performance.
  • Manage Langfuse Integration: Store datasets, judge prompts, and baseline metrics in Langfuse for a single source of truth.
  • Generate Local Snapshots: Create local contract files (.json, .yaml) for the optimization loop to consume.
  • Bootstrap Modes: Supports both dataset-backed and live-trace evaluation setups.
  • Use Case: When starting a new agent development cycle, use this Skill to define accuracy and relevance dimensions, set up the necessary Langfuse prompts, and generate the evaluation contract file that the agent optimization loop will use to start its iterations.

Quick Start

Use the langfuse-eval-infrastructure skill to bootstrap the evaluation infrastructure for an agent named 'my-agent' using the dataset 'my-agent-eval'.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: langfuse-eval-infrastructure
Download link: https://github.com/mberto10/mberto-compound/archive/main.zip#langfuse-eval-infrastructure

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.