pyspark-patterns

Name: pyspark-patterns
Availability: InStock
Author: linus-mcmanamey

Community

Streamline PySpark ETL, debug faster.

Data & Analytics

Authorlinus-mcmanamey

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill provides a comprehensive guide to PySpark best practices, standardized ETL patterns, and efficient DataFrame operations, eliminating guesswork and ensuring consistent, high-quality PySpark code. It helps developers write and debug PySpark code more effectively, reducing errors and development time.

Core Features & Use Cases

Standardized ETL Patterns: Follow a consistent Extract-Transform-Load structure for all data transformations.
Optimized DataFrame Operations: Leverage TableUtilities for common tasks like deduplication, hashing, and timestamp cleaning.
Robust Logging & Error Handling: Implement NotebookLogger and @synapse_error_print_handler for clear, consistent operational insights and error management.
Use Case: When building a new data pipeline, use this skill to quickly recall the project's preferred PySpark patterns for data ingestion, transformation, and loading, ensuring your code adheres to established standards.

Quick Start

Explain the project's PySpark ETL pattern for a new silver layer table named 's_customer_data' from 'bronze_db.b_customer_raw'.

pyspark-patterns

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper