data-wrangler

Name: data-wrangler
Availability: InStock
Author: richard-gyiko

Community

Automate data transformation with DuckDB SQL.

Data & Analytics #analytics #SQL #data transformation #data cleaning #data integration #DuckDB #data export

Authorrichard-gyiko

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill eliminates the manual effort and complexity of data manipulation by enabling powerful SQL-based transformations across diverse data sources. It's ideal for large datasets or complex operations that are beyond simple in-context reasoning, saving you time and reducing errors.

Core Features & Use Cases

Multi-Source Data Integration: Seamlessly read and join data from various formats (CSV, Parquet, JSON, Excel) and databases (Postgres, MySQL, SQLite, S3, GCS, Azure, R2).
Advanced SQL Transformations: Apply sophisticated DuckDB SQL operations including joins, aggregations, PIVOT/UNPIVOT, sampling, and window functions.
Flexible Data Export: Write transformed results to new files (Parquet, CSV, JSON), with options for compression and Hive-style partitioning.
Use Case: Imagine you have sales data in a CSV, product information in Parquet, and customer details in a PostgreSQL database. Use this Skill to join these sources, calculate total revenue per customer, and export the result as a partitioned Parquet dataset for your analytics team.

Quick Start

Join orders.parquet with customers.csv and show total orders per customer.

Dependency Matrix

Required Modules

duckdbpolarspydanticpyyaml

Components

scripts