Skill Explorer

Searching protocol for "baseline evaluation"

running-skill-edd-cycle

Community

Run evaluation-driven development for skills.

Advanced

bytaisukeoe

neutral-target-baseline

Community

Establish neutral baseline metrics for unbiased assessment.

No Config

bystarwreckntx

model-evaluation-framework

Community

Quantify model performance with robust metrics.

Advanced

byilyasibrahim

eval-harness

Community

Formal evaluation framework for Claude Code

Advanced

bylinnefromice

eval-running

Community

Benchmark Loa skill quality with automated evals.

Few Config

byAdeitasuna

stable-baselines3

Community

Train and evaluate RL agents with SB3.

Advanced

byjackspace

stable-baselines3

Community

Master Reinforcement Learning with SB3.

Advanced

byyouyinnn

stable-baselines3

Community

Master Reinforcement Learning with Stable Baselines3.

Advanced

byjacketlong23

regression

Official

Ensure evaluation quality and consistency.

Advanced

byAnkh-Studio

stable-baselines3

Community

Train RL agents fast with SB3 and vectorized environments.

Advanced

byovachiever

AILANG Post-Release Tasks

Official

Automate post-release benchmarks and dashboard updates.

Few Config

bysunholo-data

delay-model-gate-evaluator

Community

Validate delay-model gates against HPWL baselines.

Advanced

byMr-Fang-VLSI