Searching protocol for "judging"
Automate quality evaluation of plans and code.
Trace judgments on the PoJ chain.
Dual-judge evaluation for task commits.
Calibrate LLM judges against human labels.
Design LLM-as-Judge evaluators.
Design LLM judges for subjective criteria.
Evaluate AI Configs with built-in judges.
25-dimension judgments for content quality.
Orchestrate tasks with judge verification.
Calibrate LLM judges against human labels.
Evaluate work with an AI judge.
Build LLM evaluators for quality assessment.