03-deduplication

Official

Prevent duplicate key errors in Gold MERGE.

Authordatabricks-solutions
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the critical issue of duplicate business keys in data being merged into the Gold layer, preventing common Delta Lake errors and ensuring data integrity.

Core Features & Use Cases

  • Standardized Deduplication: Implements a robust, ordered deduplication pattern before MERGE operations.
  • Error Prevention: Avoids DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE errors by ensuring source uniqueness.
  • Use Case: When merging customer data from Silver to Gold, if the Silver table contains multiple records for the same customer due to streaming or CDC, this skill ensures only the latest record is used for the merge, preventing failures.

Quick Start

Use this skill to deduplicate the silver_customer_dim table on the 'customer_id' business key before merging into the gold_customer_dim table.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: 03-deduplication
Download link: https://github.com/databricks-solutions/vibe-coding-workshop-template/archive/main.zip#03-deduplication

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.