math-extractor
CommunityExtracts math content from documents.
Education & Research#nlp#definitions#document-processing#tex#math-extraction#theorems#pdf-to-markdown
AuthorDevelata
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Extracts strictly mathematical terms (Definitions, Theorems, Lemmas, Propositions, Proofs) from documents, handling PDF conversion and AI-based cleaning. Use when the user wants to extract math content from a file.
Core Features & Use Cases
- Robust PDF Conversion: Uses MinerU for high-quality PDF to Markdown conversion.
- Smart Chunking: Splits text by paragraphs to avoid breaking math formulas.
- Cost Optimization: Heuristically filters out non-math chunks to save tokens.
- Math Protection: Whitelists safe HTML tags to prevent accidental deletion of math inequalities (e.g., a < b).
- Encoding Fallback: Automatically tries UTF-8, GBK, and Latin-1 encodings.
- Retry Logic: Built-in retries for API calls to handle network instability.
- Use Case: Imagine you have a scanned thesis in PDF or a collection of lecture notes; run this skill to extract all mathematical terms and compile them into a clean Markdown file.
Quick Start
Run the Python script with a document path and an output directory to produce a file named <filename>_extracted.md.
Dependency Matrix
Required Modules
requests
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: math-extractor Download link: https://github.com/Develata/Deve-Skills/archive/main.zip#math-extractor Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.