spark-optimization

Community

Debug Spark, optimize performance, cut costs.

Authorianpojman
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Debugging complex Apache Spark/EMR job failures and optimizing their performance can be a daunting, time-consuming task requiring deep expertise. This Skill provides expert guidance to quickly diagnose issues, improve job efficiency, and significantly reduce cloud costs, allowing you to rest easy knowing your data pipelines are optimized.

Core Features & Use Cases

  • Failure Debugging: Guides you through container log analysis to pinpoint root causes of Spark/EMR job failures, saving hours of investigation.
  • Performance Bottleneck Analysis: Helps identify and resolve CPU, I/O, and memory bottlenecks using proven profiling techniques.
  • Cost Optimization: Provides strategies to significantly reduce EMR cluster costs and S3 request expenses, boosting your ROI.
  • Iceberg Migration: Offers a comprehensive guide to migrating to Iceberg for improved performance, ACID guarantees, and schema evolution.
  • Use Case: When your Spark job fails with an OutOfMemoryError, use this Skill to follow a diagnostic decision tree, understand "writer explosion," and implement solutions like micro-batching or Iceberg migration to get your job running efficiently and cost-effectively.

Quick Start

Use the spark-optimization skill to help me debug an OutOfMemoryError in my EMR job.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: spark-optimization
Download link: https://github.com/ianpojman/claude-skills/archive/main.zip#spark-optimization

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository