hive
OfficialMaster Hive SQL, process big data.
Authortreasure-data
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides expert assistance for writing, analyzing, and optimizing Hive SQL queries for Treasure Data, helping users manage large-scale batch processing, maintain legacy workflows, and overcome memory constraints often encountered with Trino. It ensures robust and scalable data processing.
Core Features & Use Cases
- Hive-Specific Optimization: Guides on using
MAPJOINhints for small table joins, leveraging partition pruning, and understanding MapReduce strategies for efficient query execution. - TD Hive Functions: Explains the effective use of Treasure Data's proprietary Hive functions like
TD_INTERVAL,TD_TIME_RANGE, andTD_TIME_FORMATfor time-based data processing. - Advanced Hive Features: Covers concepts like SerDe,
LATERAL VIEW EXPLODEfor flattening arrays, and dynamic partitioning for complex data transformations. - Use Case: A data engineer needs to process several months of historical event data for a monthly report, which often causes memory errors in Trino. This skill helps them write a robust Hive query, utilizing
MAPJOINfor lookup tables andTD_TIME_RANGEfor efficient partitioning, ensuring the large batch job completes reliably.
Quick Start
Query daily unique users for the last month using Hive
SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'JST') as date, COUNT(DISTINCT user_id) as unique_users FROM database_name.events WHERE TD_INTERVAL(time, '-1M', 'JST') GROUP BY 1
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: hive Download link: https://github.com/treasure-data/td-skills/archive/main.zip#hive Please download this .zip file, extract it, and install it in the .claude/skills/ directory.