ray-train
CommunityScale training across clusters with Ray
Software Engineering#telemetry#hyperparameter tuning#distributed training#multi-node#ray tune#ray-train#torch
Authorovachiever
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill guides distributed training orchestration across clusters, enabling scalable experiments with Ray Train, Ray Tune, and elastic resource management.
Core Features & Use Cases
- Distributed Training: Runs PyTorch/TensorFlow/HuggingFace across multiple nodes with minimal code changes.
- Hyperparameter Tuning: Integrated Ray Tune for scalable hyperparameter optimization.
- Fault Tolerance & Elasticity: Automatic handling of worker failures and dynamic resource scaling.
- Multi-Framework Support: Work across PyTorch, TensorFlow, and HuggingFace ecosystems.
- Multi-Node Scaling: Easy configuration for cross‑node deployments.
Quick Start
Basic PyTorch training on a single node can be extended to multi-node clusters with Ray Train by wiring a TorchTrainer and a ScalingConfig.
Dependency Matrix
Required Modules
ray[train]torchtransformers
Components
scriptsreferences
đź’» Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: ray-train Download link: https://github.com/ovachiever/droid-tings/archive/main.zip#ray-train Please download this .zip file, extract it, and install it in the .claude/skills/ directory.