gemini-vision

Name: gemini-vision
Availability: InStock
Author: AIA-11-HN-MIB

Official

Automate advanced image understanding with Gemini.

Data & Analytics #segmentation #visual-qa #image-analysis #captioning #object-detection #gemini-vision #pdf-processing

AuthorAIA-11-HN-MIB

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill enables Claude to leverage Google's Gemini Vision API to analyze images at scale—providing captions, content classification, visual question answering, object detection, segmentation, and multi-image analysis, including document understanding for PDFs.

Core Features & Use Cases

Captioning & Classification: Generate descriptive captions and categorize image content to automate tagging and organization.
Visual Question Answering: Answer natural-language questions about image content for quick insights.
Object Detection & Segmentation: Identify and locate objects with bounding boxes and pixel-level masks for precise scene understanding.
Document Understanding: Process PDFs (up to 1,000 pages) to extract text and structure for automation and analysis.
Multi-Image Analysis: Compare and analyze up to 3,600 images to surface trends and changes.

Quick Start

Use the Gemini Vision skill to analyze an image, for example: python scripts/analyze-image.py image.jpg "Describe this image?"

gemini-vision

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper