ray-data
CommunityScale ML data processing effortlessly.
AuthorDoanNgocCuong
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the challenge of processing large datasets efficiently for machine learning, enabling distributed data preprocessing and scalable pipelines.
Core Features & Use Cases
- Distributed Data Processing: Handles datasets larger than memory across multiple nodes.
- Multi-modal Support: Works with various data formats including Parquet, CSV, JSON, and images.
- Framework Integration: Seamlessly integrates with Ray Train, PyTorch, and TensorFlow.
- Use Case: Use this Skill to preprocess terabytes of image and text data for a deep learning model training job, distributing the workload across a cluster of machines.
Quick Start
Use the ray-data skill to read all parquet files from 's3://my-bucket/data/' and then apply a lowercasing transformation to the 'text' column.
Dependency Matrix
Required Modules
ray[data]pyarrowpandas
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: ray-data Download link: https://github.com/DoanNgocCuong/continuous-training-pipeline_T3_2026/archive/main.zip#ray-data Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.