advanced-evaluation
CommunityBuild reliable LLM evaluation systems.
Software Engineering#llm evaluation#llm-as-a-judge#quality assessment#bias mitigation#pairwise comparison#rubric generation
Author466852675
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the challenge of reliably evaluating LLM outputs, ensuring quality and mitigating biases inherent in automated assessment.
Core Features & Use Cases
- LLM-as-a-Judge: Implement advanced techniques for using LLMs to evaluate other LLM responses.
- Bias Mitigation: Actively counter position bias, length bias, and other systematic errors in evaluation.
- Rubric Generation: Create structured, domain-specific rubrics for consistent scoring.
- Use Case: You need to compare two AI-generated summaries of a news article. This Skill helps you set up a robust evaluation process to determine which summary is superior based on criteria like accuracy, conciseness, and clarity, while accounting for potential biases.
Quick Start
Use the advanced-evaluation skill to compare two model responses based on accuracy and clarity.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: advanced-evaluation Download link: https://github.com/466852675/TISHICIKU-2025/archive/main.zip#advanced-evaluation Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.