cluster-blame

Community

Diagnose Slurm resource stranding

Authorolliecrow
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps identify why cluster resources (CPU, memory, GPU) appear idle and who might be blocking scheduling due to misconfigured job submissions.

Core Features & Use Cases

  • Resource Stranding Audit: Analyzes Slurm queue state to find jobs that unnecessarily occupy resources, preventing others from running.
  • Attribution & Evidence: Distinguishes between user misconfiguration and scheduler policy effects, providing confidence-ranked evidence.
  • Use Case: When users complain about slow job starts or idle GPUs, this Skill can pinpoint specific jobs or users whose resource requests are inefficiently shaped, leading to fragmented capacity.

Quick Start

Use the cluster-blame skill to quickly scan the current Slurm queue and identify likely users or jobs currently stranding CPU, GPU, or memory capacity.

Dependency Matrix

Required Modules

None required

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cluster-blame
Download link: https://github.com/olliecrow/codex/archive/main.zip#cluster-blame

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.