Skill Explorer

Searching protocol for "evaluation dataset"

langfuse-dataset-setup

Community

Set up Langfuse datasets & evaluations

Few Config

bymberto10

fiftyone-model-evaluation

Official

Evaluate model predictions against ground truth.

Few Config

byvoxel51

evaluation-metrics

Community

Rigorous, reproducible LLM evaluation.

Advanced

byricardoroche

databricks-mlflow-evaluation

Community

End-to-end GenAI evaluation with MLflow.

Advanced

byandregit2026

eval-engine

Community

LLM evaluation pipeline

Advanced

bymqzkim

huggingface-evaluate

Official

Evaluate ML models & datasets

Few Config

byDTMC-marketplace

golden-dataset

Community

Manage AI evaluation datasets with confidence.

Advanced

byyonatangross

LangSmith Datasets

Official

Create and manage LangSmith evaluation datasets.

Few Config

byDiploma-pending

LangSmith Dataset

Official

Create evaluation datasets from traces.

Few Config

bylangchain-ai

trulens-dataset-curation

Official

Create and curate ground-truth evaluation data.

Few Config

bytruera

langsmith-dataset

Community

Manage LangSmith evaluation datasets.

Few Config

bydhar174

mlflow-evaluation

Community

MLflow GenAI evaluation for quality.

Advanced

bydatasciencemonkey