document-to-markdown
CommunityConvert documents and URLs to clean Markdown.
Authorkcchien
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Converts documents and URLs into clean Markdown for downstream LLM/RAG workflows, enabling consistent text extraction and content ingestion across tools.
Core Features & Use Cases
- Converts PDFs, Office files, images, HTML and URLs to Markdown to support AI pipelines.
- Supports batch processing, frontmatter metadata, and multiple backends (PyMuPDF4LLM, Marker, MarkItDown, PaddleOCR).
- Real-world use: transform a folder of PDFs and web pages into a searchable Markdown knowledge base for agents.
Quick Start
Convert a local PDF or URL to Markdown using gateway.py with your preferred backends.
Dependency Matrix
Required Modules
markitdownpymupdf4llmmarkerpaddleocrpaddlepaddlepdf2imagesuryapytesseracteasyocropencc-python-reimplementedpillowlxmlprettytablenumpy
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: document-to-markdown Download link: https://github.com/kcchien/skills/archive/main.zip#document-to-markdown Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.