03-deduplication
OfficialPrevent duplicate key errors in Gold MERGE.
Authordatabricks-solutions
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the critical issue of duplicate business keys in data being merged into the Gold layer, preventing common Delta Lake errors and ensuring data integrity.
Core Features & Use Cases
- Standardized Deduplication: Implements a robust, ordered deduplication pattern before MERGE operations.
- Error Prevention: Avoids
DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGEerrors by ensuring source uniqueness. - Use Case: When merging customer data from Silver to Gold, if the Silver table contains multiple records for the same customer due to streaming or CDC, this skill ensures only the latest record is used for the merge, preventing failures.
Quick Start
Use this skill to deduplicate the silver_customer_dim table on the 'customer_id' business key before merging into the gold_customer_dim table.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: 03-deduplication Download link: https://github.com/databricks-solutions/vibe-coding-workshop-template/archive/main.zip#03-deduplication Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.