model-evaluation-benchmark
CommunityAutomated evaluation benchmarks for models
Authorrysweet
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Consistently evaluating model performance across efficiency, quality, and workflow adherence is essential for comparing agents and validating improvements.
Core Features & Use Cases
- End-to-end benchmark orchestration following Benchmark Suite v3
- Score aggregation, test coverage checks, and artifact creation
- GitHub issues/PRs and documentation generation as outputs
- Reproducible results for regression testing
Quick Start
Run the benchmark suite against a selected model and task set to produce a report.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: model-evaluation-benchmark Download link: https://github.com/rysweet/AzureHayMaker/archive/main.zip#model-evaluation-benchmark Please download this .zip file, extract it, and install it in the .claude/skills/ directory.