pdf-page-extract
CommunityExtract rich PDF data for AI-ready content.
Data & Analytics#pdfplumber#pdf extraction#image extraction#pymupdf#text mining#data preparation#document parsing
AuthorAbeJitsu
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Getting structured, high-fidelity data from PDFs is foundational for AI processing but often complex and time-consuming. This Skill deterministically extracts all necessary data from PDF pages, creating a robust, AI-ready foundation for downstream tasks.
Core Features & Use Cases
- Rich Text Extraction: Pulls text spans with font metadata (size, style, position) using PyMuPDF and pdfplumber for detailed content analysis.
- High-Resolution Rendering: Converts PDF pages to 300+ DPI PNG images, providing a precise visual reference for AI.
- Page Mapping: Establishes an authoritative mapping of PDF indices to book page numbers for consistent referencing and navigation.
- Use Case: Prepare a PDF textbook chapter by extracting all text, images, and visual layouts, creating a complete set of artifacts for AI-driven HTML conversion.
Quick Start
Extract rich data from pages 15 to 28 of the attached 'PREP-AL 4th Ed 9-26-25.pdf'.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: pdf-page-extract Download link: https://github.com/AbeJitsu/Game-Settings-Panel/archive/main.zip#pdf-page-extract Please download this .zip file, extract it, and install it in the .claude/skills/ directory.