langfuse-eval-infrastructure
CommunityBootstrap your agent evaluation infrastructure.
Authormberto10
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates the setup and maintenance of an evaluation infrastructure for agent optimization loops, ensuring a standardized and robust process for measuring agent performance.
Core Features & Use Cases
- Define Eval Dimensions: Specify key metrics and thresholds for evaluating agent performance.
- Manage Langfuse Integration: Store datasets, judge prompts, and baseline metrics in Langfuse for a single source of truth.
- Generate Local Snapshots: Create local contract files (
.json,.yaml) for the optimization loop to consume. - Bootstrap Modes: Supports both dataset-backed and live-trace evaluation setups.
- Use Case: When starting a new agent development cycle, use this Skill to define accuracy and relevance dimensions, set up the necessary Langfuse prompts, and generate the evaluation contract file that the agent optimization loop will use to start its iterations.
Quick Start
Use the langfuse-eval-infrastructure skill to bootstrap the evaluation infrastructure for an agent named 'my-agent' using the dataset 'my-agent-eval'.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: langfuse-eval-infrastructure Download link: https://github.com/mberto10/mberto-compound/archive/main.zip#langfuse-eval-infrastructure Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.