Name: benchmark-datasets
Availability: InStock
Author: pluginagentmarketplace

System Documentation

What problem does it solve?

This Skill provides standardized AI security benchmarks and datasets to evaluate safety, robustness, and compliance across AI systems.

Core Features & Use Cases

Curated benchmark catalog covering safety, robustness, jailbreak, privacy, and bias evaluation to enable comprehensive security assessments.
Reproducible evaluation workflows with provided scripts and catalog references for consistent results.
Use Case: Compare model A and model B on a unified benchmark suite and generate a security assessment report.

Quick Start

Run the evaluation workflow with the provided Python script to execute the benchmarks and generate a report.
Open the generated benchmark_report.json to review results.
Update assets/benchmarks-catalog.yaml and references/BENCHMARK-CATALOG.md to add new benchmarks.

Please help me install this Skill: Name: benchmark-datasets Download link: https://github.com/pluginagentmarketplace/custom-plugin-ai-red-teaming/archive/main.zip#benchmark-datasets Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

benchmark-datasets

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper