Benchmark Manager

Community

Create and debug AILANG evaluation benchmarks with precision.

Authorsunholo-data
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill eliminates benchmark creation errors and debugging frustration by providing expert guidance on AILANG's evaluation system, particularly the critical distinction between prompt types.

Core Features & Use Cases

  • Benchmark Validation: Automatically check YAML files for common issues like incorrect prompt usage.
  • Debugging Tools: Show exactly what prompts models receive and test benchmarks efficiently.
  • Use Case: When your benchmark shows 0% pass rate despite language support, use this Skill to identify and fix the underlying prompt configuration problem.

Quick Start

Use the Benchmark Manager skill to debug the failing json_parse benchmark by showing the full prompt and testing with a cheap model.

Dependency Matrix

Required Modules

ailangjqyqpython3

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: Benchmark Manager
Download link: https://github.com/sunholo-data/ailang/archive/main.zip#benchmark-manager

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository