Searching protocol for "judge-prompts"
Design LLM-as-Judge evaluators.
Set up Langfuse datasets & evaluations
Design LLM judges for subjective criteria.
Build LLM evaluators for quality assessment.
Route LLM evaluation tasks.
Calibrate LLM judges against human labels.
Calibrate LLM judges against human labels.
Automate LLM prompt evaluation.
Craft, review, and improve LLM prompts.
Set up agent evaluation pipeline.
Bootstrap your agent evaluation infrastructure.