training-data

Community

Master your ML data: label, augment, split.

Authordoanchienthangdev
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the critical challenge of preparing high-quality training data for machine learning models, ensuring better model performance and reliability.

Core Features & Use Cases

  • Data Labeling: Supports manual labeling (e.g., for Label Studio) and weak supervision techniques using Snorkel.
  • Data Augmentation: Implements various augmentation strategies for text, images, and tabular data (e.g., SMOTE).
  • Imbalanced Data Handling: Provides methods like class weighting and focal loss to address skewed datasets.
  • Data Splitting: Offers robust splitting strategies including random, temporal, and group splits to prevent data leakage.
  • Use Case: You have an imbalanced dataset for a fraud detection model. Use this Skill to apply SMOTE for oversampling the minority class and then perform a stratified split to ensure both training and testing sets accurately represent the class distribution.

Quick Start

Use the training-data skill to perform a stratified split on your dataset with a 80/20 train/test ratio.

Dependency Matrix

Required Modules

pandasscikit-learnnlpaugalbumentationsimblearntorch

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: training-data
Download link: https://github.com/doanchienthangdev/omgkit/archive/main.zip#training-data

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.