gemini-multimodal
CommunityProcess images, video, audio, and PDFs.
Software Engineering#ai#multimodal#pdf extraction#gemini#video processing#audio transcription#image analysis
AuthorFutureAtoms
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill streamlines the analysis and processing of various media types (images, video, audio, PDFs) by leveraging the Gemini API, reducing the need for manual inspection and data extraction.
Core Features & Use Cases
- Multimodal Input: Accepts images, video, audio, and PDF files for analysis.
- Specific Tasks: Supports object detection, image segmentation, video summarization, audio transcription, and structured data extraction from PDFs.
- Use Case: Upload a product image and ask the AI to identify all visible products and their bounding boxes, or provide a meeting recording and get a summarized transcript with key discussion points.
Quick Start
Use the gemini-multimodal skill to summarize the key points in the attached document 'report.pdf'.
Dependency Matrix
Required Modules
google-generativeai
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gemini-multimodal Download link: https://github.com/FutureAtoms/claude-skills-backup/archive/main.zip#gemini-multimodal Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.