cluster-monitor

Community

Monitor and manage Slurm cluster jobs.

Authorolliecrow
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the monitoring of Slurm cluster jobs, providing deep diagnostics and intelligent intervention to ensure efficient job completion and prevent costly reruns.

Core Features & Use Cases

  • Proactive Monitoring: Continuously tracks job status, logs, and outputs for long-running tasks.
  • Intelligent Intervention: Automatically intervenes when jobs are likely to produce invalid results or waste resources, including canceling, cleaning up, fixing, and resubmitting.
  • Use Case: Monitor a large-scale simulation job running for days, automatically detect and fix a common error in the output logs, and ensure the job completes successfully without manual oversight.

Quick Start

Monitor current conversation Slurm jobs and current project Slurm jobs with low-noise polling and microscope-level checks of logs, outputs, and results, intervening only when necessary.

Dependency Matrix

Required Modules

None required

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cluster-monitor
Download link: https://github.com/olliecrow/codex/archive/main.zip#cluster-monitor

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.