Name: multimodal-rag
Availability: InStock
Author: zilliztech

System Documentation

What problem does it solve?

This Skill helps users answer questions about documents that mix text and images, charts, or diagrams by combining textual and visual context.

Core Features & Use Cases

Multimodal RAG: Retrieve and reason over both text and image content from PDFs, manuals, reports, and presentations.
Visual Q&A: Answer questions about charts, diagrams, and figures embedded in documents.
Image-aware retrieval: Return relevant image captions or references alongside text results for richer context.

Quick Start

Example: Ask a question like "What does the revenue chart show on page 2 of the product manual?" and run multimodal reasoning across text and visuals.

Please help me install this Skill: Name: multimodal-rag Download link: https://github.com/zilliztech/milvus-marketplace/archive/main.zip#multimodal-rag Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

multimodal-rag

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper