hugging-face-evaluation-manager

Name: hugging-face-evaluation-manager
Availability: InStock
Author: Nymbo

Community

Add and manage evaluation results in model cards

Data & Analytics #evaluation #vLLM #huggingface #Artificial-Analysis #readme-extraction #model-index

AuthorNymbo

Version1.0.0

Installs1

System Documentation

What problem does it solve?

Helps update model cards with evaluation results by extracting from README content, importing benchmark scores from Artificial Analysis, and running custom model evaluations with vLLM or lighteval. Works with the model-index metadata format.

Core Features & Use Cases

Extract from README: Parse and convert evaluation tables in READMEs to model-index YAML.
Import AA scores: Pull benchmark data from Artificial Analysis API and merge with existing results.
Run evaluations: Execute vLLM or lighteval evaluations via local GPU or HF Jobs, with PR handling and validation.
Model-index updates: Merge results into model cards using the model-index format and ensure Papers with Code compatibility.

Quick Start

Preview extraction: uv run scripts/evaluation_manager.py extract-readme --repo-id "your-username/your-model" --dry-run
Apply extraction: uv run scripts/evaluation_manager.py extract-readme --repo-id "your-username/your-model"
Import AA scores: python scripts/evaluation_manager.py import-aa --creator-slug "anthropic" --model-name "claude-sonnet-4" --repo-id "username/model"

hugging-face-evaluation-manager

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper