databricks-synthetic-data-gen

Community

Generate realistic synthetic data at scale.

AuthorAradhya0510
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the creation of realistic, story-driven synthetic data for Databricks, eliminating the need for manual data creation or the use of sensitive production data for testing and development.

Core Features & Use Cases

  • Scalable Generation: Generates data from thousands to millions of rows using Spark + Faker + Pandas UDFs.
  • Realistic Patterns: Supports complex data patterns including referential integrity, non-linear distributions, and time-based trends.
  • Multiple Output Formats: Saves data as Parquet, JSON, CSV, or Delta tables in Unity Catalog Volumes.
  • Use Case: Generate a realistic e-commerce dataset with customers, orders, and products, complete with realistic purchase patterns and customer segmentation, for use in a Databricks analytics demo.

Quick Start

Generate 10,000 synthetic customer records and 50,000 orders for the catalog 'my_catalog' and schema 'my_schema'.

Dependency Matrix

Required Modules

databricks-connectfakernumpypandasholidays

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: databricks-synthetic-data-gen
Download link: https://github.com/Aradhya0510/databricks-cv-accelerator/archive/main.zip#databricks-synthetic-data-gen

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.