openjudge
OfficialBuild LLM evaluation pipelines.
Authoragentscope-ai
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill streamlines the process of evaluating AI application outputs, enabling users to build robust quality assessment pipelines and drive continuous optimization.
Core Features & Use Cases
- Customizable Evaluation: Design and run evaluation pipelines using a variety of pre-built or custom graders.
- Automated Grading: Automate the assessment of LLM outputs for correctness, relevance, hallucination, and more.
- Data-Driven Rubrics: Generate evaluation rubrics automatically from data.
- Use Case: You have developed a new chatbot and want to rigorously evaluate its responses against a set of test queries. Use this Skill to define grading criteria, run evaluations, and analyze the results to identify areas for improvement.
Quick Start
Use the openjudge skill to evaluate LLM responses for correctness using a provided reference.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: openjudge Download link: https://github.com/agentscope-ai/OpenJudge/archive/main.zip#openjudge Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.