faker-data-generation

Official

Generate realistic test data with corruption.

Authordatabricks-solutions
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the creation of synthetic datasets for testing data pipelines, enabling robust data quality validation and simulation of production-like scenarios.

Core Features & Use Cases

  • Realistic Data Generation: Creates data with non-linear distributions, temporal patterns, and row coherence.
  • Configurable Corruption: Intentionally introduces data quality issues (nulls, invalid formats, out-of-range values) to test DLT expectations.
  • Use Case: Generate 10,000 customer records with realistic attributes and a 5% corruption rate to test your Bronze layer ingestion and DLT quality checks.

Quick Start

Generate 1000 customer records with a 5% corruption rate for the default catalog and schema.

Dependency Matrix

Required Modules

Fakerholidaysnumpypandas

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: faker-data-generation
Download link: https://github.com/databricks-solutions/vibe-coding-workshop-template/archive/main.zip#faker-data-generation

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.