pdf-page-extract

Community

Extract rich PDF data for AI-ready content.

AuthorAbeJitsu
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Getting structured, high-fidelity data from PDFs is foundational for AI processing but often complex and time-consuming. This Skill deterministically extracts all necessary data from PDF pages, creating a robust, AI-ready foundation for downstream tasks.

Core Features & Use Cases

  • Rich Text Extraction: Pulls text spans with font metadata (size, style, position) using PyMuPDF and pdfplumber for detailed content analysis.
  • High-Resolution Rendering: Converts PDF pages to 300+ DPI PNG images, providing a precise visual reference for AI.
  • Page Mapping: Establishes an authoritative mapping of PDF indices to book page numbers for consistent referencing and navigation.
  • Use Case: Prepare a PDF textbook chapter by extracting all text, images, and visual layouts, creating a complete set of artifacts for AI-driven HTML conversion.

Quick Start

Extract rich data from pages 15 to 28 of the attached 'PREP-AL 4th Ed 9-26-25.pdf'.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: pdf-page-extract
Download link: https://github.com/AbeJitsu/Game-Settings-Panel/archive/main.zip#pdf-page-extract

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository