java-debugging-prod-incidents
CommunitySRE-focused JVM incident debugging playbook.
AuthorHZeroxium
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Production incidents on Java services are often fast-moving and difficult to diagnose without a repeatable workflow. This playbook provides an SRE-first, observability-driven approach that guides triage, safe JVM diagnostics, and coordinated communication to restore service quickly.
Core Features & Use Cases
- Observability-first triage: prioritize symptoms using logs, metrics, and traces to identify impact and containment.
- Safe JVM diagnostics: include thread dumps, JFR snippets, and GC/heap checks without risky live changes.
- Rollback and mitigations: offer a decision tree for feature flags, rollbacks, and rate-limiting to stabilize prod.
- Blameless postmortems: provide structured timelines and guardrails to prevent recurrence.
Quick Start
Follow the incident workflow on the affected Java service to stabilize and drive investigation.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: java-debugging-prod-incidents Download link: https://github.com/HZeroxium/cursorkit/archive/main.zip#java-debugging-prod-incidents Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.