Skill Explorer

Searching protocol for "benchmark suite"

run-benchmark

Official

Launch and manage CodeScaleBench runs.

Advanced

bysourcegraph

tbench

Community

Benchmark AI agents with Terminal-Bench.

Advanced

byneilmovva

julia-bench-run

Community

Run benchmarks and track performance.

Few Config

byKrastanov

model-evaluation-benchmark

Community

Automated evaluation benchmarks for models

Advanced

byrysweet

worker-benchmarks

Community

Benchmark agentic worker performance.

Advanced

byfrankxai

eval-running

Community

Benchmark Loa skill quality with automated evals.

Few Config

byAdeitasuna

julia-bench-write

Community

Write Julia benchmarks

Few Config

byKrastanov

benchmark-audit

Official

Audit benchmark quality and validity.

Few Config

bysourcegraph

Model Evaluation Benchmark Skill

Community

Automate AI model benchmarking.

Advanced

byrysweet

scaffold-task

Official

Create new benchmark tasks and suites.

Advanced

bysourcegraph

V3 Performance Optimization

Official

Boost v3 performance with benchmarking.

Advanced

byLLM-Dev-Ops

lm-evaluation-harness

Community

Benchmark LLMs with standardized 60+ tasks.

Advanced

byovachiever