data-normalizer
CommunityAutomate data extraction from archaeology reports.
Authoryounga1234
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Archaeological research often involves manually extracting data from numerous documents (PDF, HWP, DOCX, TXT), a process that is both time-consuming and prone to errors. This Skill automates the collection and standardization of this critical information.
Core Features & Use Cases
- Automated Data Extraction: Scans diverse document types (PDF, HWP, DOCX, TXT) to extract raw text and key metadata.
- Metadata Normalization: Standardizes extracted information such as title, author, publication year, location, coordinates, historical period, and artifact/feature types.
- Use Case: Process hundreds of archaeological reports and academic papers from your local folders, automatically creating a unified, structured dataset (JSONL, CSV) ready for in-depth analysis, saving days of manual data entry.
Quick Start
Use the data-normalizer skill to collect and normalize all documents in the '논문/', '발굴조사보고서/', and '주변유적/' folders.
Dependency Matrix
Required Modules
PyPDF2pdfplumberolefile
Components
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: data-normalizer Download link: https://github.com/younga1234/20251112-3/archive/main.zip#data-normalizer Please download this .zip file, extract it, and install it in the .claude/skills/ directory.