volcano-job-diagnose
OfficialDiagnose Volcano Job issues
Authorscitix
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps SREs and DevOps engineers quickly identify and understand the root causes of failures or unexpected behavior in Volcano batch jobs, reducing Mean Time To Resolution (MTTR).
Core Features & Use Cases
- Job Status Overview: Provides a summary of the Volcano Job's phase, task counts (pending, running, succeeded, failed), and configured policies.
- Task & Pod Analysis: Lists individual task statuses and, in verbose mode, details about associated Kubernetes pods.
- PodGroup Association: Verifies the link between the Job and its PodGroup, crucial for gang scheduling.
- Event Log Review: Displays recent Kubernetes events related to the job for debugging.
- Use Case: When a critical training job in a Kubernetes cluster using Volcano fails, this Skill can be invoked to rapidly diagnose whether the issue lies with job configuration, resource allocation, pod scheduling, or task execution policies.
Quick Start
Diagnose the Volcano Job named 'my-training-job' in the 'training' namespace.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: volcano-job-diagnose Download link: https://github.com/scitix/siclaw/archive/main.zip#volcano-job-diagnose Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.