hugging-face-evaluation-manager

Community

Add and manage evaluation results in model cards

AuthorNymbo
Version1.0.0
Installs1

System Documentation

What problem does it solve?

Helps update model cards with evaluation results by extracting from README content, importing benchmark scores from Artificial Analysis, and running custom model evaluations with vLLM or lighteval. Works with the model-index metadata format.

Core Features & Use Cases

  • Extract from README: Parse and convert evaluation tables in READMEs to model-index YAML.
  • Import AA scores: Pull benchmark data from Artificial Analysis API and merge with existing results.
  • Run evaluations: Execute vLLM or lighteval evaluations via local GPU or HF Jobs, with PR handling and validation.
  • Model-index updates: Merge results into model cards using the model-index format and ensure Papers with Code compatibility.

Quick Start

  • Preview extraction: uv run scripts/evaluation_manager.py extract-readme --repo-id "your-username/your-model" --dry-run
  • Apply extraction: uv run scripts/evaluation_manager.py extract-readme --repo-id "your-username/your-model"
  • Import AA scores: python scripts/evaluation_manager.py import-aa --creator-slug "anthropic" --model-name "claude-sonnet-4" --repo-id "username/model"

Dependency Matrix

Required Modules

huggingface_hubmarkdown-it-pypython-dotenvpyyamlrequests

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: hugging-face-evaluation-manager
Download link: https://github.com/Nymbo/Skills/archive/main.zip#hugging-face-evaluation-manager

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository