Benchmark Manager
CommunityCreate and debug AILANG evaluation benchmarks with precision.
Authorsunholo-data
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill eliminates benchmark creation errors and debugging frustration by providing expert guidance on AILANG's evaluation system, particularly the critical distinction between prompt types.
Core Features & Use Cases
- Benchmark Validation: Automatically check YAML files for common issues like incorrect prompt usage.
- Debugging Tools: Show exactly what prompts models receive and test benchmarks efficiently.
- Use Case: When your benchmark shows 0% pass rate despite language support, use this Skill to identify and fix the underlying prompt configuration problem.
Quick Start
Use the Benchmark Manager skill to debug the failing json_parse benchmark by showing the full prompt and testing with a cheap model.
Dependency Matrix
Required Modules
ailangjqyqpython3
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: Benchmark Manager Download link: https://github.com/sunholo-data/ailang/archive/main.zip#benchmark-manager Please download this .zip file, extract it, and install it in the .claude/skills/ directory.