gemini-vision

Official

Automate advanced image understanding with Gemini.

AuthorAIA-11-HN-MIB
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill enables Claude to leverage Google's Gemini Vision API to analyze images at scale—providing captions, content classification, visual question answering, object detection, segmentation, and multi-image analysis, including document understanding for PDFs.

Core Features & Use Cases

  • Captioning & Classification: Generate descriptive captions and categorize image content to automate tagging and organization.
  • Visual Question Answering: Answer natural-language questions about image content for quick insights.
  • Object Detection & Segmentation: Identify and locate objects with bounding boxes and pixel-level masks for precise scene understanding.
  • Document Understanding: Process PDFs (up to 1,000 pages) to extract text and structure for automation and analysis.
  • Multi-Image Analysis: Compare and analyze up to 3,600 images to surface trends and changes.

Quick Start

Use the Gemini Vision skill to analyze an image, for example: python scripts/analyze-image.py image.jpg "Describe this image?"

Dependency Matrix

Required Modules

google-genai

Components

scriptsreferences

šŸ’» Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: gemini-vision
Download link: https://github.com/AIA-11-HN-MIB/MIB-MockInterviewAIBot/archive/main.zip#gemini-vision

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository