benchmark-datasets
CommunityStandard AI security benchmarks for robust eval
Authorpluginagentmarketplace
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides standardized AI security benchmarks and datasets to evaluate safety, robustness, and compliance across AI systems.
Core Features & Use Cases
- Curated benchmark catalog covering safety, robustness, jailbreak, privacy, and bias evaluation to enable comprehensive security assessments.
- Reproducible evaluation workflows with provided scripts and catalog references for consistent results.
- Use Case: Compare model A and model B on a unified benchmark suite and generate a security assessment report.
Quick Start
- Run the evaluation workflow with the provided Python script to execute the benchmarks and generate a report.
- Open the generated benchmark_report.json to review results.
- Update assets/benchmarks-catalog.yaml and references/BENCHMARK-CATALOG.md to add new benchmarks.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: benchmark-datasets Download link: https://github.com/pluginagentmarketplace/custom-plugin-ai-red-teaming/archive/main.zip#benchmark-datasets Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.