pdf-extractor

Community

Extract data from PDFs, including scanned.

AuthorGreenmamba29
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the extraction of structured information, text, tables, images, and form data from PDF documents, including scanned ones requiring OCR.

Core Features & Use Cases

  • Content Extraction: Extracts text, tables, images, and form field data.
  • OCR Support: Processes scanned PDFs using OCR engines like Tesseract or Google Vision API.
  • Batch Processing: Handles large sets of documents efficiently.
  • Use Case: Extracting invoice data from a batch of supplier PDFs for accounting.

Quick Start

Use the pdf-extractor skill to extract tables from the file './invoices/lithium_supplier_inv_2026.pdf' and save the output as a CSV file.

Dependency Matrix

Required Modules

tesseractgoogle-cloud-vision

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: pdf-extractor
Download link: https://github.com/Greenmamba29/skillsdotmd_web/archive/main.zip#pdf-extractor

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.