pyspark-patterns
CommunityStreamline PySpark ETL, debug faster.
Authorlinus-mcmanamey
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides a comprehensive guide to PySpark best practices, standardized ETL patterns, and efficient DataFrame operations, eliminating guesswork and ensuring consistent, high-quality PySpark code. It helps developers write and debug PySpark code more effectively, reducing errors and development time.
Core Features & Use Cases
- Standardized ETL Patterns: Follow a consistent Extract-Transform-Load structure for all data transformations.
- Optimized DataFrame Operations: Leverage
TableUtilitiesfor common tasks like deduplication, hashing, and timestamp cleaning. - Robust Logging & Error Handling: Implement
NotebookLoggerand@synapse_error_print_handlerfor clear, consistent operational insights and error management. - Use Case: When building a new data pipeline, use this skill to quickly recall the project's preferred PySpark patterns for data ingestion, transformation, and loading, ensuring your code adheres to established standards.
Quick Start
Explain the project's PySpark ETL pattern for a new silver layer table named 's_customer_data' from 'bronze_db.b_customer_raw'.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: pyspark-patterns Download link: https://github.com/linus-mcmanamey/multi-agent-user-story-development/archive/main.zip#pyspark-patterns Please download this .zip file, extract it, and install it in the .claude/skills/ directory.