dataset-redaction

Community

Redact PHI and synthesize benchmark data.

AuthorJustinChaney2023
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill creates safe evaluation datasets by redacting PHI and optionally generating synthetic equivalents while preserving document structure necessary for OCR, STT, and LLM benchmarking.

Core Features & Use Cases

  • Redaction: apply deterministic pseudonymization to patient identifiers across visits.
  • Synthetic generation: produce realistic, test-ready data with controlled deltas for benchmarking.
  • Deliverables: provide ready-to-use artifacts such as redaction policies, schemas, and tooling specs for reproducibility.

Quick Start

Run the redaction pipeline on a sample dataset to generate redacted_documents.json and gold_facts.json, then validate the dataset with the provided manifests.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: dataset-redaction
Download link: https://github.com/JustinChaney2023/orate/archive/main.zip#dataset-redaction

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.