faker-data-generation
OfficialGenerate realistic test data with corruption.
Authordatabricks-solutions
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates the creation of synthetic datasets for testing data pipelines, enabling robust data quality validation and simulation of production-like scenarios.
Core Features & Use Cases
- Realistic Data Generation: Creates data with non-linear distributions, temporal patterns, and row coherence.
- Configurable Corruption: Intentionally introduces data quality issues (nulls, invalid formats, out-of-range values) to test DLT expectations.
- Use Case: Generate 10,000 customer records with realistic attributes and a 5% corruption rate to test your Bronze layer ingestion and DLT quality checks.
Quick Start
Generate 1000 customer records with a 5% corruption rate for the default catalog and schema.
Dependency Matrix
Required Modules
Fakerholidaysnumpypandas
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: faker-data-generation Download link: https://github.com/databricks-solutions/vibe-coding-workshop-template/archive/main.zip#faker-data-generation Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.