sota-data-cleaning-feature-selection-eda
CommunityMaster SOTA data prep for Kaggle in Colab.
Authorraphaelmansuy
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates data preparation workflows for medium tabular datasets by combining automated EDA, cleaning, and feature selection to accelerate modeling.
Core Features & Use Cases
- Automated EDA with Sweetviz to surface distributions, correlations, and leakage indicators.
- Automated cleaning with Pyjanitor to fix naming, deduplicate, normalize, and impute missing values.
- Feature selection using a hybrid approach (filters + embedded methods like XGBoost or Lasso) to reduce dimensionality before training.
- Colab-friendly execution on datasets in the 100MB–5GB range with Polars for fast I/O.
Quick Start
- Install required packages in Colab: polars, sweetviz, pyjanitor, xgboost, scikit-learn.
- Load a medium dataset (approx 100MB–5GB) and run an automated EDA pass.
- Apply automated cleaning transforms and run a hybrid feature selection to prepare data for model training.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: sota-data-cleaning-feature-selection-eda Download link: https://github.com/raphaelmansuy/machine-learning-feature-selection/archive/main.zip#sota-data-cleaning-feature-selection-eda Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.