langfuse-experiment-runner
CommunityRun and analyze LLM experiments.
Authormberto10
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates the process of running experiments on datasets, evaluating LLM outputs, and analyzing the results, streamlining the development and testing of AI models and prompts.
Core Features & Use Cases
- Experiment Execution: Run tasks on datasets, using either local scripts or Langfuse-defined prompts as judges.
- Result Analysis: Compare experiment runs, analyze score distributions, and identify failures.
- Use Case: You've developed a new prompt for your chatbot. Use this Skill to run it against a dataset of user queries, evaluate its responses using Langfuse judges, and compare its performance against the previous prompt version to ensure it's an improvement.
Quick Start
Run an experiment using the 'my-regression-tests' dataset with the 'v2.1-test' run name and the specified task script.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: langfuse-experiment-runner Download link: https://github.com/mberto10/mberto-compound/archive/main.zip#langfuse-experiment-runner Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.