pdf-harvester
CommunityTurn PDFs into searchable text and tables.
Authormindmorass
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill streamlines the extraction of text, tables, and metadata from PDF documents, enabling fast ingestion into RAG pipelines and searchable archives.
Core Features & Use Cases
- Text and layout-preserving extraction from PDFs, including support for tables and conversion to Markdown.
- OCR for image-based or scanned documents to recover content with pytesseract.
- Academic paper parsing with structure detection for abstracts, sections, and references, plus metadata extraction.
Quick Start
Run a sample PDF through the harvest process to extract text, tables, and metadata, then inspect the resulting data structure.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: pdf-harvester Download link: https://github.com/mindmorass/reflex/archive/main.zip#pdf-harvester Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.