model-evaluation-benchmark

Community

Automated evaluation benchmarks for models

Authorrysweet
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Consistently evaluating model performance across efficiency, quality, and workflow adherence is essential for comparing agents and validating improvements.

Core Features & Use Cases

  • End-to-end benchmark orchestration following Benchmark Suite v3
  • Score aggregation, test coverage checks, and artifact creation
  • GitHub issues/PRs and documentation generation as outputs
  • Reproducible results for regression testing

Quick Start

Run the benchmark suite against a selected model and task set to produce a report.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: model-evaluation-benchmark
Download link: https://github.com/rysweet/AzureHayMaker/archive/main.zip#model-evaluation-benchmark

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository