evaluation-design

Official

Design rigorous AI safety evals with a rubric.

AuthorEquiStamp
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Phase 3 evaluation design for AI safety often requires structured, rubric-aligned design artifacts. This Skill helps users generate rigorous evaluation designs that align with the 7-dimension rubric and Phase 2 outputs to ensure consistent, defensible assessments.

Core Features & Use Cases

  • Guides users to craft an Evaluation Question, Instrument Design, Score Against Rubric, and Operational Checklist.
  • Supports both from-scratch design and review of existing evaluation designs for risk-grounded decisions.
  • Produces a complete Evaluation Design Document including tier justification, inputs, scoring, and exclusion criteria, plus a final scorecard.

Quick Start

Provide Phase 2 targets and request a Phase 3 design or review, and receive a complete Evaluation Design Document.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: evaluation-design
Download link: https://github.com/EquiStamp/evaluating-evaluations/archive/main.zip#evaluation-design

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.