score-tasks
OfficialEvaluate benchmark task quality.
Software Engineering#quality assurance#benchmark#reproducibility#code evaluation#task scoring#verifier quality
Authorsourcegraph
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the need for consistent and objective evaluation of benchmark tasks, ensuring their clarity, verifiability, and reproducibility.
Core Features & Use Cases
- Automated Quality Scoring: Assigns scores based on instruction clarity, verifier quality, and reproducibility.
- Identification of Weaknesses: Flags tasks that fall below a specified quality threshold, highlighting areas for improvement.
- Use Case: A benchmark curator can use this Skill to automatically assess a new set of tasks, ensuring they meet the required standards before being added to the benchmark suite.
Quick Start
Use the score-tasks skill to score all tasks in the csb_sdlc_pytorch suite and display the results in a table.
Dependency Matrix
Required Modules
None requiredComponents
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: score-tasks Download link: https://github.com/sourcegraph/CodeScaleBench/archive/main.zip#score-tasks Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.