Skill Explorer

Searching protocol for "llm-judge"

validate-evaluator

Community

Calibrate LLM judges against human labels.

Advanced

bymarchatton

validate-evaluator

Community

Calibrate LLM judges against human labels.

Advanced

byhamelsmu

rnow-rewards

Official

Define RL rewards for ReinforceNow training.

Advanced

byReinforceNow

agentv-eval-builder

Official

Build and manage AgentV evaluation files.

Few Config

byEntityProcess

build-judge

Community

Design binary LLM judges for single failures

Advanced

bybreethomas

Cekura Metric Design

Official

Design and refine AI voice agent metrics.

Advanced

bycekura-ai

metric-creator

Official

Auto-generate Fair-Forge metrics scaffolds.

Advanced

byAlquimia-ai

langfuse-dataset-setup

Community

Set up Langfuse datasets & evaluations

Few Config

bymberto10

write-judge-prompt

Community

Design LLM judges for subjective criteria.

Few Config

byhamelsmu

advanced-evaluation

Community

Build robust LLM evaluation systems.

Advanced

byboazcstrike

databricks-mlflow-evaluation

Community

Evaluate and optimize GenAI agents.

Advanced

byrobkisk

advanced-evaluation

Community

Make LLM judgments reliable with proven methods.

Advanced

bysamvanme