Skill Explorer

Searching protocol for "evaluation-harness"

valid-skill

Community

Test fixture skill for eval harness.

No Config

byhumanity4ai

invalid-bad-category

Community

Test fixture skill

No Config

byhumanity4ai

llm-evaluation

Community

Benchmark LLMs with standard tasks and backends.

Advanced

bytylertitsworth

cns-tinker

Official

Orchestrate CNS with Tinker for narratives.

Advanced

byNorth-Shore-AI

build-eval

Community

Build rigorous evals for LLM agents and prompts.

Advanced

byyzavyas

llm-evaluation

Community

Quantify and boost LLM performance.

Advanced

bycarlopezzuto

anthropic-evaluations

Community

Design and run robust AI agent evaluations.

No Config

bydwmkerr

mcp-builder

Community

Build robust MCP servers with clear tooling.

Advanced

bysargupta

evaluating-llms-harness

Community

Benchmark LLMs against academic standards.

Advanced

bysangrokjung

mcp-builder

Community

Create resilient MCP servers in TS or Python.

Few Config

byxiaden

Eval Running Skill

Official

Benchmark Loa skill quality with eval suites.

Advanced

by0xHoneyJar

ai-llm

Community

Build, evaluate, and deploy production-grade LLMs.

Advanced

byvasilyu1983