pdf-harvester

Community

Turn PDFs into searchable text and tables.

Authormindmorass
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill streamlines the extraction of text, tables, and metadata from PDF documents, enabling fast ingestion into RAG pipelines and searchable archives.

Core Features & Use Cases

  • Text and layout-preserving extraction from PDFs, including support for tables and conversion to Markdown.
  • OCR for image-based or scanned documents to recover content with pytesseract.
  • Academic paper parsing with structure detection for abstracts, sections, and references, plus metadata extraction.

Quick Start

Run a sample PDF through the harvest process to extract text, tables, and metadata, then inspect the resulting data structure.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: pdf-harvester
Download link: https://github.com/mindmorass/reflex/archive/main.zip#pdf-harvester

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.