gemini-multimodal

Community

Process images, video, audio, and PDFs.

AuthorFutureAtoms
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill streamlines the analysis and processing of various media types (images, video, audio, PDFs) by leveraging the Gemini API, reducing the need for manual inspection and data extraction.

Core Features & Use Cases

  • Multimodal Input: Accepts images, video, audio, and PDF files for analysis.
  • Specific Tasks: Supports object detection, image segmentation, video summarization, audio transcription, and structured data extraction from PDFs.
  • Use Case: Upload a product image and ask the AI to identify all visible products and their bounding boxes, or provide a meeting recording and get a summarized transcript with key discussion points.

Quick Start

Use the gemini-multimodal skill to summarize the key points in the attached document 'report.pdf'.

Dependency Matrix

Required Modules

google-generativeai

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: gemini-multimodal
Download link: https://github.com/FutureAtoms/claude-skills-backup/archive/main.zip#gemini-multimodal

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.