lucy-ng:sanitize

Community

Sanitize NMR datasets for blind CASE analyses.

Authorsteinbeck
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill removes compound identity information from Bruker NMR datasets' metadata while preserving spectroscopic data for valid blind CASE evaluations.

Core Features & Use Cases

  • Automated metadata sanitization: Redacts compound names, CAS numbers, and other identifiers across titles, logs, peak lists, and dataset metadata without touching spectral data.
  • Manifest-driven redaction: Generates or consumes a redaction manifest to ensure reproducible, auditable sanitization steps.
  • Safety-first workflow: Enforces a fresh AI session after sanitization to prevent memory leakage of identities and supports verification via a post-sanitization extractor run.
  • Use Case: Prepare public Bruker NMR datasets for blind CASE studies by removing identity information before analysis.

Quick Start

Steps:

  1. Run the text extractor to review dataset content: python lucy_text_extractor.py <dataset_path>
  2. AI identifies identifiers and creates a manifest file (identifiers.txt)
  3. Run the bulk sanitizer: python lucy_bulk_sanitize.py <dataset_path> --manifest identifiers.txt
  4. Re-run the text extractor to verify sanitization: python lucy_text_extractor.py <dataset_path>

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: lucy-ng:sanitize
Download link: https://github.com/steinbeck/lucy-ng/archive/main.zip#lucy-ng-sanitize

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.