libingest

Name: libingest
Availability: InStock
Author: copilot-ld

Official

Ingest documents into schema.org HTML for indexing

Software Engineering #automation #pdf #html #knowledge-management #ingest #schema.org #document-pipeline

Authorcopilot-ld

Version1.0.0

Installs0

System Documentation

What problem does it solve?

libingest provides a structured document ingestion workflow that converts PDFs, PowerPoints, and images into Schema.org annotated HTML for efficient indexing and knowledge extraction.

Core Features & Use Cases

Orchestrates a configurable pipeline (pdf-to-images, images-to-html, extract-context, annotate-html, normalize-html) to transform source documents into rich HTML.
Produces intermediate artifacts (context, image fragments, annotated HTML) for downstream retrieval and analysis.
Suitable for indexing knowledge bases, document search, and content catalogs.

Quick Start

Drop a document into the ingest folder and start the ingestion pipeline to generate structured HTML ready for indexing.

Dependency Matrix

Required Modules

@copilot-ld/libllm@copilot-ld/libpolicy@copilot-ld/libprompt@copilot-ld/libstorage@copilot-ld/libtype@copilot-ld/libutiljs-yaml

Components

Standard package