sota-data-cleaning-feature-selection-eda

Community

Master SOTA data prep for Kaggle in Colab.

Authorraphaelmansuy
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates data preparation workflows for medium tabular datasets by combining automated EDA, cleaning, and feature selection to accelerate modeling.

Core Features & Use Cases

  • Automated EDA with Sweetviz to surface distributions, correlations, and leakage indicators.
  • Automated cleaning with Pyjanitor to fix naming, deduplicate, normalize, and impute missing values.
  • Feature selection using a hybrid approach (filters + embedded methods like XGBoost or Lasso) to reduce dimensionality before training.
  • Colab-friendly execution on datasets in the 100MB–5GB range with Polars for fast I/O.

Quick Start

  1. Install required packages in Colab: polars, sweetviz, pyjanitor, xgboost, scikit-learn.
  2. Load a medium dataset (approx 100MB–5GB) and run an automated EDA pass.
  3. Apply automated cleaning transforms and run a hybrid feature selection to prepare data for model training.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: sota-data-cleaning-feature-selection-eda
Download link: https://github.com/raphaelmansuy/machine-learning-feature-selection/archive/main.zip#sota-data-cleaning-feature-selection-eda

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.