benchmark-datasets

Community

Standard AI security benchmarks for robust eval

Authorpluginagentmarketplace
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides standardized AI security benchmarks and datasets to evaluate safety, robustness, and compliance across AI systems.

Core Features & Use Cases

  • Curated benchmark catalog covering safety, robustness, jailbreak, privacy, and bias evaluation to enable comprehensive security assessments.
  • Reproducible evaluation workflows with provided scripts and catalog references for consistent results.
  • Use Case: Compare model A and model B on a unified benchmark suite and generate a security assessment report.

Quick Start

  1. Run the evaluation workflow with the provided Python script to execute the benchmarks and generate a report.
  2. Open the generated benchmark_report.json to review results.
  3. Update assets/benchmarks-catalog.yaml and references/BENCHMARK-CATALOG.md to add new benchmarks.

Dependency Matrix

Required Modules

None required

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: benchmark-datasets
Download link: https://github.com/pluginagentmarketplace/custom-plugin-ai-red-teaming/archive/main.zip#benchmark-datasets

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.