nemo-evaluator
CommunityBenchmark LLMs with NeMo Evaluator.
Authoreyadsibai
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill streamlines the process of evaluating Large Language Models (LLMs) by providing a robust framework for running industry-standard benchmarks and setting up complex evaluation pipelines.
Core Features & Use Cases
- Comprehensive Benchmarking: Supports over 100 benchmarks across 18+ harnesses, including MMLU, HumanEval, and GSM8K.
- Reproducible Evaluation: Utilizes containerization for consistent results across different environments.
- Flexible Deployment: Enables evaluation on local Docker, Slurm HPC clusters, or cloud platforms.
- Use Case: You need to compare the performance of two new LLMs on coding tasks and general knowledge. Use this Skill to configure and run HumanEval and MMLU benchmarks for both models, generating a comparative report.
Quick Start
Install the NeMo Evaluator SDK by running pip install nemo-evaluator-launcher.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: nemo-evaluator Download link: https://github.com/eyadsibai/ltk/archive/main.zip#nemo-evaluator Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.