gemini-audio

Community

Transcribe, summarize, analyze, and synthesize audio with Gemini.

Authoralex-tgk
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Gemini Audio provides transcription, analysis, and summarization of audio, plus text-to-speech generation. It streamlines workflows for podcasts, interviews, meetings, and multimedia content by turning audio into searchable text and actionable insights.

Core Features & Use Cases

  • Transcription with timestamps and multi-speaker support
  • Audio summarization and key-point extraction
  • Non-speech audio analysis (music, ambient sounds)
  • Text-to-speech (TTS) generation with controllable voice styles
  • File management via a Files API workflow for reuse across tasks

Quick Start

Configure GEMINI_API_KEY, then run transcribe.py or generate-speech.py to process audio files or synthesize speech.

Dependency Matrix

Required Modules

google-genai

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: gemini-audio
Download link: https://github.com/alex-tgk/saasaas/archive/main.zip#gemini-audio

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository