Name: gemini-multimodal
Availability: InStock
Author: FutureAtoms

System Documentation

What problem does it solve?

This Skill streamlines the analysis and processing of various media types (images, video, audio, PDFs) by leveraging the Gemini API, reducing the need for manual inspection and data extraction.

Core Features & Use Cases

Multimodal Input: Accepts images, video, audio, and PDF files for analysis.
Specific Tasks: Supports object detection, image segmentation, video summarization, audio transcription, and structured data extraction from PDFs.
Use Case: Upload a product image and ask the AI to identify all visible products and their bounding boxes, or provide a meeting recording and get a summarized transcript with key discussion points.

Quick Start

Use the gemini-multimodal skill to summarize the key points in the attached document 'report.pdf'.

Please help me install this Skill: Name: gemini-multimodal Download link: https://github.com/FutureAtoms/claude-skills-backup/archive/main.zip#gemini-multimodal Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

gemini-multimodal

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper