training-data
CommunityMaster your ML data: label, augment, split.
Data & Analytics#mlops#machine learning#data labeling#data augmentation#data splitting#imbalanced data
Authordoanchienthangdev
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the critical challenge of preparing high-quality training data for machine learning models, ensuring better model performance and reliability.
Core Features & Use Cases
- Data Labeling: Supports manual labeling (e.g., for Label Studio) and weak supervision techniques using Snorkel.
- Data Augmentation: Implements various augmentation strategies for text, images, and tabular data (e.g., SMOTE).
- Imbalanced Data Handling: Provides methods like class weighting and focal loss to address skewed datasets.
- Data Splitting: Offers robust splitting strategies including random, temporal, and group splits to prevent data leakage.
- Use Case: You have an imbalanced dataset for a fraud detection model. Use this Skill to apply SMOTE for oversampling the minority class and then perform a stratified split to ensure both training and testing sets accurately represent the class distribution.
Quick Start
Use the training-data skill to perform a stratified split on your dataset with a 80/20 train/test ratio.
Dependency Matrix
Required Modules
pandasscikit-learnnlpaugalbumentationsimblearntorch
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: training-data Download link: https://github.com/doanchienthangdev/omgkit/archive/main.zip#training-data Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.