cluster-blame
CommunityDiagnose Slurm resource stranding
Software Engineering#resource management#performance analysis#capacity planning#hpc#slurm#job scheduling
Authorolliecrow
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps identify why cluster resources (CPU, memory, GPU) appear idle and who might be blocking scheduling due to misconfigured job submissions.
Core Features & Use Cases
- Resource Stranding Audit: Analyzes Slurm queue state to find jobs that unnecessarily occupy resources, preventing others from running.
- Attribution & Evidence: Distinguishes between user misconfiguration and scheduler policy effects, providing confidence-ranked evidence.
- Use Case: When users complain about slow job starts or idle GPUs, this Skill can pinpoint specific jobs or users whose resource requests are inefficiently shaped, leading to fragmented capacity.
Quick Start
Use the cluster-blame skill to quickly scan the current Slurm queue and identify likely users or jobs currently stranding CPU, GPU, or memory capacity.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: cluster-blame Download link: https://github.com/olliecrow/codex/archive/main.zip#cluster-blame Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.