volcano-job-diagnose

Official

Diagnose Volcano Job issues

Authorscitix
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps SREs and DevOps engineers quickly identify and understand the root causes of failures or unexpected behavior in Volcano batch jobs, reducing Mean Time To Resolution (MTTR).

Core Features & Use Cases

  • Job Status Overview: Provides a summary of the Volcano Job's phase, task counts (pending, running, succeeded, failed), and configured policies.
  • Task & Pod Analysis: Lists individual task statuses and, in verbose mode, details about associated Kubernetes pods.
  • PodGroup Association: Verifies the link between the Job and its PodGroup, crucial for gang scheduling.
  • Event Log Review: Displays recent Kubernetes events related to the job for debugging.
  • Use Case: When a critical training job in a Kubernetes cluster using Volcano fails, this Skill can be invoked to rapidly diagnose whether the issue lies with job configuration, resource allocation, pod scheduling, or task execution policies.

Quick Start

Diagnose the Volcano Job named 'my-training-job' in the 'training' namespace.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: volcano-job-diagnose
Download link: https://github.com/scitix/siclaw/archive/main.zip#volcano-job-diagnose

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.