Skill Explorer

Searching protocol for "human evaluation"

llm-evaluation

Community

Automated and human evaluation for LLMs.

Advanced

by48Nauts-Operator

llm-evaluation

Community

LLM evaluation with automated benchmarks.

Advanced

byccf

validate-evaluator

Community

Calibrate LLM judges against human labels.

Advanced

bymarchatton

validate-evaluator

Community

Calibrate LLM judges against human labels.

Advanced

byhamelsmu

evaluating-code-models

Official

Benchmark code generation models.

Advanced

byOrchestra-Research

llm-evaluation

Community

LLM evaluation with metrics and benchmarks.

Advanced

byAndyAnh174

evaluating-code-models

Community

Benchmark code generation models.

Advanced

bysangrokjung

llm-evaluation

Community

Quantify and boost LLM performance.

Advanced

bycarlopezzuto

human-centered-design-fundamentals

Official

Design for real human cognition.

Advanced

bycuriositech

llm-evaluation

Community

Measure and improve LLM performance.

Advanced

byDrLuggels

evaluating-code-models

Community

Benchmark code generation models.

Advanced

bytianhao909

llm-evaluation

Community

LLM evaluation with automated metrics.

Advanced

byapassuello