Name: nemo-evaluator
Availability: InStock
Author: eyadsibai

System Documentation

What problem does it solve?

This Skill streamlines the process of evaluating Large Language Models (LLMs) by providing a robust framework for running industry-standard benchmarks and setting up complex evaluation pipelines.

Core Features & Use Cases

Comprehensive Benchmarking: Supports over 100 benchmarks across 18+ harnesses, including MMLU, HumanEval, and GSM8K.
Reproducible Evaluation: Utilizes containerization for consistent results across different environments.
Flexible Deployment: Enables evaluation on local Docker, Slurm HPC clusters, or cloud platforms.
Use Case: You need to compare the performance of two new LLMs on coding tasks and general knowledge. Use this Skill to configure and run HumanEval and MMLU benchmarks for both models, generating a comparative report.

Quick Start

Install the NeMo Evaluator SDK by running pip install nemo-evaluator-launcher.

Please help me install this Skill: Name: nemo-evaluator Download link: https://github.com/eyadsibai/ltk/archive/main.zip#nemo-evaluator Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

nemo-evaluator

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper