gemini-vision
OfficialAutomate advanced image understanding with Gemini.
Data & Analytics#segmentation#visual-qa#image-analysis#captioning#object-detection#gemini-vision#pdf-processing
AuthorAIA-11-HN-MIB
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill enables Claude to leverage Google's Gemini Vision API to analyze images at scaleāproviding captions, content classification, visual question answering, object detection, segmentation, and multi-image analysis, including document understanding for PDFs.
Core Features & Use Cases
- Captioning & Classification: Generate descriptive captions and categorize image content to automate tagging and organization.
- Visual Question Answering: Answer natural-language questions about image content for quick insights.
- Object Detection & Segmentation: Identify and locate objects with bounding boxes and pixel-level masks for precise scene understanding.
- Document Understanding: Process PDFs (up to 1,000 pages) to extract text and structure for automation and analysis.
- Multi-Image Analysis: Compare and analyze up to 3,600 images to surface trends and changes.
Quick Start
Use the Gemini Vision skill to analyze an image, for example: python scripts/analyze-image.py image.jpg "Describe this image?"
Dependency Matrix
Required Modules
google-genai
Components
scriptsreferences
š» Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gemini-vision Download link: https://github.com/AIA-11-HN-MIB/MIB-MockInterviewAIBot/archive/main.zip#gemini-vision Please download this .zip file, extract it, and install it in the .claude/skills/ directory.