dataset-redaction
CommunityRedact PHI and synthesize benchmark data.
AuthorJustinChaney2023
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill creates safe evaluation datasets by redacting PHI and optionally generating synthetic equivalents while preserving document structure necessary for OCR, STT, and LLM benchmarking.
Core Features & Use Cases
- Redaction: apply deterministic pseudonymization to patient identifiers across visits.
- Synthetic generation: produce realistic, test-ready data with controlled deltas for benchmarking.
- Deliverables: provide ready-to-use artifacts such as redaction policies, schemas, and tooling specs for reproducibility.
Quick Start
Run the redaction pipeline on a sample dataset to generate redacted_documents.json and gold_facts.json, then validate the dataset with the provided manifests.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: dataset-redaction Download link: https://github.com/JustinChaney2023/orate/archive/main.zip#dataset-redaction Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.