cluster-optimise
CommunityOptimize cluster job throughput and efficiency.
Authorolliecrow
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the challenge of inefficiently configured cluster jobs, leading to wasted resources, long wait times, and potential failures like OOM errors.
Core Features & Use Cases
- Iterative Optimization: Runs staged experiments to fine-tune job configurations for maximum throughput and resource efficiency.
- Failure Prevention: Actively works to avoid crashes and Out-Of-Memory (OOM) failures by analyzing job statuses and logs.
- Resource Tuning: Adjusts job shapes and resources across pipeline stages (data fetching, preprocessing, training, eval) to minimize total wall-clock completion time.
- Use Case: When running large-scale machine learning training jobs on a cluster, this Skill can be used to automatically adjust the CPU, memory, and GPU allocations for each stage of the pipeline to reduce overall execution time and prevent jobs from being killed due to resource exhaustion.
Quick Start
Use the cluster-optimise skill to iteratively tune the resource allocation for the training stage of the current project to minimize wall-clock time.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: cluster-optimise Download link: https://github.com/olliecrow/codex/archive/main.zip#cluster-optimise Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.