Searching protocol for "evaluation-report"
Generate CSB evaluation reports.
Turn evaluation findings into tracked GitHub issues.
Evaluate, benchmark, and report across domains.
Audit Skill designs with expert scoring.
Rank benchmark attempts with a live leaderboard.
Automate evaluation dashboards from metrics.
Audit skills with expert-quality scoring.
Merge MMI and DDD results to guide improvements.
Rigorous Skill quality assessment and scoring.
Fix evaluation failures and re-evaluate.
Ensure model quality and fairness.
Evaluate IDD suitability and master its approach.