hugging-face-evaluation-manager
CommunityAdd and manage evaluation results in model cards
AuthorNymbo
Version1.0.0
Installs1
System Documentation
What problem does it solve?
Helps update model cards with evaluation results by extracting from README content, importing benchmark scores from Artificial Analysis, and running custom model evaluations with vLLM or lighteval. Works with the model-index metadata format.
Core Features & Use Cases
- Extract from README: Parse and convert evaluation tables in READMEs to model-index YAML.
- Import AA scores: Pull benchmark data from Artificial Analysis API and merge with existing results.
- Run evaluations: Execute vLLM or lighteval evaluations via local GPU or HF Jobs, with PR handling and validation.
- Model-index updates: Merge results into model cards using the model-index format and ensure Papers with Code compatibility.
Quick Start
- Preview extraction: uv run scripts/evaluation_manager.py extract-readme --repo-id "your-username/your-model" --dry-run
- Apply extraction: uv run scripts/evaluation_manager.py extract-readme --repo-id "your-username/your-model"
- Import AA scores: python scripts/evaluation_manager.py import-aa --creator-slug "anthropic" --model-name "claude-sonnet-4" --repo-id "username/model"
Dependency Matrix
Required Modules
huggingface_hubmarkdown-it-pypython-dotenvpyyamlrequests
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: hugging-face-evaluation-manager Download link: https://github.com/Nymbo/Skills/archive/main.zip#hugging-face-evaluation-manager Please download this .zip file, extract it, and install it in the .claude/skills/ directory.