lucy-ng:sanitize

Name: lucy-ng:sanitize
Availability: InStock
Author: steinbeck

Community

Sanitize NMR datasets for blind CASE analyses.

Education & Research #workflow #sanitization #NMR #data-integrity #Bruker #CASE #metadata-redaction

Authorsteinbeck

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This skill removes compound identity information from Bruker NMR datasets' metadata while preserving spectroscopic data for valid blind CASE evaluations.

Core Features & Use Cases

Automated metadata sanitization: Redacts compound names, CAS numbers, and other identifiers across titles, logs, peak lists, and dataset metadata without touching spectral data.
Manifest-driven redaction: Generates or consumes a redaction manifest to ensure reproducible, auditable sanitization steps.
Safety-first workflow: Enforces a fresh AI session after sanitization to prevent memory leakage of identities and supports verification via a post-sanitization extractor run.
Use Case: Prepare public Bruker NMR datasets for blind CASE studies by removing identity information before analysis.

Quick Start

Steps:

Run the text extractor to review dataset content: python lucy_text_extractor.py <dataset_path>
AI identifies identifiers and creates a manifest file (identifiers.txt)
Run the bulk sanitizer: python lucy_bulk_sanitize.py <dataset_path> --manifest identifiers.txt
Re-run the text extractor to verify sanitization: python lucy_text_extractor.py <dataset_path>

lucy-ng:sanitize

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper