score-tasks

Official

Evaluate benchmark task quality.

Authorsourcegraph
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the need for consistent and objective evaluation of benchmark tasks, ensuring their clarity, verifiability, and reproducibility.

Core Features & Use Cases

  • Automated Quality Scoring: Assigns scores based on instruction clarity, verifier quality, and reproducibility.
  • Identification of Weaknesses: Flags tasks that fall below a specified quality threshold, highlighting areas for improvement.
  • Use Case: A benchmark curator can use this Skill to automatically assess a new set of tasks, ensuring they meet the required standards before being added to the benchmark suite.

Quick Start

Use the score-tasks skill to score all tasks in the csb_sdlc_pytorch suite and display the results in a table.

Dependency Matrix

Required Modules

None required

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: score-tasks
Download link: https://github.com/sourcegraph/CodeScaleBench/archive/main.zip#score-tasks

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.