cluster-optimise

Community

Optimize cluster job throughput and efficiency.

Authorolliecrow
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the challenge of inefficiently configured cluster jobs, leading to wasted resources, long wait times, and potential failures like OOM errors.

Core Features & Use Cases

  • Iterative Optimization: Runs staged experiments to fine-tune job configurations for maximum throughput and resource efficiency.
  • Failure Prevention: Actively works to avoid crashes and Out-Of-Memory (OOM) failures by analyzing job statuses and logs.
  • Resource Tuning: Adjusts job shapes and resources across pipeline stages (data fetching, preprocessing, training, eval) to minimize total wall-clock completion time.
  • Use Case: When running large-scale machine learning training jobs on a cluster, this Skill can be used to automatically adjust the CPU, memory, and GPU allocations for each stage of the pipeline to reduce overall execution time and prevent jobs from being killed due to resource exhaustion.

Quick Start

Use the cluster-optimise skill to iteratively tune the resource allocation for the training stage of the current project to minimize wall-clock time.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cluster-optimise
Download link: https://github.com/olliecrow/codex/archive/main.zip#cluster-optimise

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.