cluster-monitor
CommunityMonitor and manage Slurm cluster jobs.
Authorolliecrow
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates the monitoring of Slurm cluster jobs, providing deep diagnostics and intelligent intervention to ensure efficient job completion and prevent costly reruns.
Core Features & Use Cases
- Proactive Monitoring: Continuously tracks job status, logs, and outputs for long-running tasks.
- Intelligent Intervention: Automatically intervenes when jobs are likely to produce invalid results or waste resources, including canceling, cleaning up, fixing, and resubmitting.
- Use Case: Monitor a large-scale simulation job running for days, automatically detect and fix a common error in the output logs, and ensure the job completes successfully without manual oversight.
Quick Start
Monitor current conversation Slurm jobs and current project Slurm jobs with low-noise polling and microscope-level checks of logs, outputs, and results, intervening only when necessary.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: cluster-monitor Download link: https://github.com/olliecrow/codex/archive/main.zip#cluster-monitor Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.