pyspark-patterns

Community

Streamline PySpark ETL, debug faster.

Authorlinus-mcmanamey
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides a comprehensive guide to PySpark best practices, standardized ETL patterns, and efficient DataFrame operations, eliminating guesswork and ensuring consistent, high-quality PySpark code. It helps developers write and debug PySpark code more effectively, reducing errors and development time.

Core Features & Use Cases

  • Standardized ETL Patterns: Follow a consistent Extract-Transform-Load structure for all data transformations.
  • Optimized DataFrame Operations: Leverage TableUtilities for common tasks like deduplication, hashing, and timestamp cleaning.
  • Robust Logging & Error Handling: Implement NotebookLogger and @synapse_error_print_handler for clear, consistent operational insights and error management.
  • Use Case: When building a new data pipeline, use this skill to quickly recall the project's preferred PySpark patterns for data ingestion, transformation, and loading, ensuring your code adheres to established standards.

Quick Start

Explain the project's PySpark ETL pattern for a new silver layer table named 's_customer_data' from 'bronze_db.b_customer_raw'.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: pyspark-patterns
Download link: https://github.com/linus-mcmanamey/multi-agent-user-story-development/archive/main.zip#pyspark-patterns

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository