reliability-engineering
CommunityBuild and maintain resilient systems.
Software Engineering#monitoring#observability#reliability#chaos engineering#sre#incident management#slo
Authormiles990
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides the principles, patterns, and tools necessary to build and maintain highly reliable software systems, minimizing downtime and ensuring service availability.
Core Features & Use Cases
- SLI/SLO/SLA Management: Define and track key performance indicators for service reliability.
- Observability: Implement metrics, logging, and tracing for deep system insights.
- Incident Management: Establish processes for detecting, responding to, and resolving incidents.
- Chaos Engineering: Proactively test system resilience by introducing controlled failures.
- Disaster Recovery: Plan and prepare for business continuity in the face of major disruptions.
- Use Case: A team is experiencing frequent production outages. They can use this Skill to define Service Level Objectives (SLOs), implement robust monitoring and alerting, and establish an incident response playbook to reduce Mean Time To Recovery (MTTR).
Quick Start
Use the reliability-engineering skill to define SLIs for service latency and error rates.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: reliability-engineering Download link: https://github.com/miles990/claude-software-skills/archive/main.zip#reliability-engineering Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.