langfuse-experiment-runner

Community

Run and analyze LLM experiments.

Authormberto10
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the process of running experiments on datasets, evaluating LLM outputs, and analyzing the results, streamlining the development and testing of AI models and prompts.

Core Features & Use Cases

  • Experiment Execution: Run tasks on datasets, using either local scripts or Langfuse-defined prompts as judges.
  • Result Analysis: Compare experiment runs, analyze score distributions, and identify failures.
  • Use Case: You've developed a new prompt for your chatbot. Use this Skill to run it against a dataset of user queries, evaluate its responses using Langfuse judges, and compare its performance against the previous prompt version to ensure it's an improvement.

Quick Start

Run an experiment using the 'my-regression-tests' dataset with the 'v2.1-test' run name and the specified task script.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: langfuse-experiment-runner
Download link: https://github.com/mberto10/mberto-compound/archive/main.zip#langfuse-experiment-runner

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.