spark-optimization
CommunityDebug Spark, optimize performance, cut costs.
Authorianpojman
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Debugging complex Apache Spark/EMR job failures and optimizing their performance can be a daunting, time-consuming task requiring deep expertise. This Skill provides expert guidance to quickly diagnose issues, improve job efficiency, and significantly reduce cloud costs, allowing you to rest easy knowing your data pipelines are optimized.
Core Features & Use Cases
- Failure Debugging: Guides you through container log analysis to pinpoint root causes of Spark/EMR job failures, saving hours of investigation.
- Performance Bottleneck Analysis: Helps identify and resolve CPU, I/O, and memory bottlenecks using proven profiling techniques.
- Cost Optimization: Provides strategies to significantly reduce EMR cluster costs and S3 request expenses, boosting your ROI.
- Iceberg Migration: Offers a comprehensive guide to migrating to Iceberg for improved performance, ACID guarantees, and schema evolution.
- Use Case: When your Spark job fails with an
OutOfMemoryError, use this Skill to follow a diagnostic decision tree, understand "writer explosion," and implement solutions like micro-batching or Iceberg migration to get your job running efficiently and cost-effectively.
Quick Start
Use the spark-optimization skill to help me debug an OutOfMemoryError in my EMR job.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: spark-optimization Download link: https://github.com/ianpojman/claude-skills/archive/main.zip#spark-optimization Please download this .zip file, extract it, and install it in the .claude/skills/ directory.