Name: langfuse-experiment-runner
Availability: InStock
Author: mberto10

System Documentation

What problem does it solve?

This Skill automates the process of running experiments on datasets, evaluating LLM outputs, and analyzing the results, streamlining the development and testing of AI models and prompts.

Core Features & Use Cases

Experiment Execution: Run tasks on datasets, using either local scripts or Langfuse-defined prompts as judges.
Result Analysis: Compare experiment runs, analyze score distributions, and identify failures.
Use Case: You've developed a new prompt for your chatbot. Use this Skill to run it against a dataset of user queries, evaluate its responses using Langfuse judges, and compare its performance against the previous prompt version to ensure it's an improvement.

Quick Start

Run an experiment using the 'my-regression-tests' dataset with the 'v2.1-test' run name and the specified task script.

Please help me install this Skill: Name: langfuse-experiment-runner Download link: https://github.com/mberto10/mberto-compound/archive/main.zip#langfuse-experiment-runner Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

langfuse-experiment-runner

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper