data-normalizer

Community

Automate data extraction from archaeology reports.

Authoryounga1234
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Archaeological research often involves manually extracting data from numerous documents (PDF, HWP, DOCX, TXT), a process that is both time-consuming and prone to errors. This Skill automates the collection and standardization of this critical information.

Core Features & Use Cases

  • Automated Data Extraction: Scans diverse document types (PDF, HWP, DOCX, TXT) to extract raw text and key metadata.
  • Metadata Normalization: Standardizes extracted information such as title, author, publication year, location, coordinates, historical period, and artifact/feature types.
  • Use Case: Process hundreds of archaeological reports and academic papers from your local folders, automatically creating a unified, structured dataset (JSONL, CSV) ready for in-depth analysis, saving days of manual data entry.

Quick Start

Use the data-normalizer skill to collect and normalize all documents in the '논문/', '발굴조사보고서/', and '주변유적/' folders.

Dependency Matrix

Required Modules

PyPDF2pdfplumberolefile

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: data-normalizer
Download link: https://github.com/younga1234/20251112-3/archive/main.zip#data-normalizer

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository