ray-data

Community

Scale ML data processing effortlessly.

AuthorDoanNgocCuong
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the challenge of processing large datasets efficiently for machine learning, enabling distributed data preprocessing and scalable pipelines.

Core Features & Use Cases

  • Distributed Data Processing: Handles datasets larger than memory across multiple nodes.
  • Multi-modal Support: Works with various data formats including Parquet, CSV, JSON, and images.
  • Framework Integration: Seamlessly integrates with Ray Train, PyTorch, and TensorFlow.
  • Use Case: Use this Skill to preprocess terabytes of image and text data for a deep learning model training job, distributing the workload across a cluster of machines.

Quick Start

Use the ray-data skill to read all parquet files from 's3://my-bucket/data/' and then apply a lowercasing transformation to the 'text' column.

Dependency Matrix

Required Modules

ray[data]pyarrowpandas

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: ray-data
Download link: https://github.com/DoanNgocCuong/continuous-training-pipeline_T3_2026/archive/main.zip#ray-data

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.