Name: model-evaluation-benchmark
Availability: InStock
Author: rysweet

System Documentation

What problem does it solve?

Consistently evaluating model performance across efficiency, quality, and workflow adherence is essential for comparing agents and validating improvements.

Core Features & Use Cases

End-to-end benchmark orchestration following Benchmark Suite v3
Score aggregation, test coverage checks, and artifact creation
GitHub issues/PRs and documentation generation as outputs
Reproducible results for regression testing

Quick Start

Run the benchmark suite against a selected model and task set to produce a report.

Please help me install this Skill: Name: model-evaluation-benchmark Download link: https://github.com/rysweet/AzureHayMaker/archive/main.zip#model-evaluation-benchmark Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

model-evaluation-benchmark

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper