hive

Official

Master Hive SQL, process big data.

Authortreasure-data
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides expert assistance for writing, analyzing, and optimizing Hive SQL queries for Treasure Data, helping users manage large-scale batch processing, maintain legacy workflows, and overcome memory constraints often encountered with Trino. It ensures robust and scalable data processing.

Core Features & Use Cases

  • Hive-Specific Optimization: Guides on using MAPJOIN hints for small table joins, leveraging partition pruning, and understanding MapReduce strategies for efficient query execution.
  • TD Hive Functions: Explains the effective use of Treasure Data's proprietary Hive functions like TD_INTERVAL, TD_TIME_RANGE, and TD_TIME_FORMAT for time-based data processing.
  • Advanced Hive Features: Covers concepts like SerDe, LATERAL VIEW EXPLODE for flattening arrays, and dynamic partitioning for complex data transformations.
  • Use Case: A data engineer needs to process several months of historical event data for a monthly report, which often causes memory errors in Trino. This skill helps them write a robust Hive query, utilizing MAPJOIN for lookup tables and TD_TIME_RANGE for efficient partitioning, ensuring the large batch job completes reliably.

Quick Start

Query daily unique users for the last month using Hive

SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'JST') as date, COUNT(DISTINCT user_id) as unique_users FROM database_name.events WHERE TD_INTERVAL(time, '-1M', 'JST') GROUP BY 1

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: hive
Download link: https://github.com/treasure-data/td-skills/archive/main.zip#hive

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository