domino-distributed-computing

Name: domino-distributed-computing
Availability: InStock
Author: dominodatalab

Official

Scale compute with Spark, Ray, Dask.

Data & Analytics #mlops #big data #distributed computing #ray #spark #domino #dask

Authordominodatalab

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill simplifies the management and utilization of distributed computing frameworks like Apache Spark, Ray, and Dask within the Domino Data Lab environment, enabling users to efficiently process large datasets and scale complex computations.

Core Features & Use Cases

Framework Selection: Guidance on choosing between Spark, Ray, and Dask based on workload requirements (data processing, ML training, parallel Python).
Cluster Management: Instructions for launching on-demand clusters via the Domino UI and Python SDK.
Code Examples: Practical Python snippets for connecting to, processing data with, and training models using Spark, Ray, and Dask.
GPU Acceleration: How to leverage GPUs with Spark (RAPIDS) and Ray.
Autoscaling: Configuration and monitoring of dynamic cluster scaling.
Use Case: You have a multi-terabyte dataset and need to perform complex ETL operations. This Skill will guide you to launch a Spark cluster, write PySpark code to process the data efficiently, and save the results.

Quick Start

Use the domino-distributed-computing skill to launch a Spark cluster with 4 workers and process data from '/mnt/data/large_dataset/'.

domino-distributed-computing

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper