java-debugging-prod-incidents

Community

SRE-focused JVM incident debugging playbook.

AuthorHZeroxium
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Production incidents on Java services are often fast-moving and difficult to diagnose without a repeatable workflow. This playbook provides an SRE-first, observability-driven approach that guides triage, safe JVM diagnostics, and coordinated communication to restore service quickly.

Core Features & Use Cases

  • Observability-first triage: prioritize symptoms using logs, metrics, and traces to identify impact and containment.
  • Safe JVM diagnostics: include thread dumps, JFR snippets, and GC/heap checks without risky live changes.
  • Rollback and mitigations: offer a decision tree for feature flags, rollbacks, and rate-limiting to stabilize prod.
  • Blameless postmortems: provide structured timelines and guardrails to prevent recurrence.

Quick Start

Follow the incident workflow on the affected Java service to stabilize and drive investigation.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: java-debugging-prod-incidents
Download link: https://github.com/HZeroxium/cursorkit/archive/main.zip#java-debugging-prod-incidents

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.