databricks-parsing

Community

Parse documents into structured text.

AuthorAradhya0510
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the extraction of text and structured data from various document types (PDF, DOCX, PPTX, images), enabling efficient document processing and the creation of custom RAG pipelines.

Core Features & Use Cases

  • Document Parsing: Utilizes the ai_parse_document SQL function to convert binary documents into structured text.
  • RAG Pipeline Foundation: Serves as the initial step for building custom Retrieval Augmented Generation pipelines by parsing and chunking documents.
  • Use Case: Ingesting a collection of research papers from a Databricks Volume, parsing them into text, and preparing them for a custom RAG system to enable semantic search.

Quick Start

Parse all documents in the '/Volumes/catalog/schema/volume/docs/' directory using the ai_parse_document function.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: databricks-parsing
Download link: https://github.com/Aradhya0510/databricks-cv-accelerator/archive/main.zip#databricks-parsing

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.