reliability-engineering

Community

Build and maintain resilient systems.

Authormiles990
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides the principles, patterns, and tools necessary to build and maintain highly reliable software systems, minimizing downtime and ensuring service availability.

Core Features & Use Cases

  • SLI/SLO/SLA Management: Define and track key performance indicators for service reliability.
  • Observability: Implement metrics, logging, and tracing for deep system insights.
  • Incident Management: Establish processes for detecting, responding to, and resolving incidents.
  • Chaos Engineering: Proactively test system resilience by introducing controlled failures.
  • Disaster Recovery: Plan and prepare for business continuity in the face of major disruptions.
  • Use Case: A team is experiencing frequent production outages. They can use this Skill to define Service Level Objectives (SLOs), implement robust monitoring and alerting, and establish an incident response playbook to reduce Mean Time To Recovery (MTTR).

Quick Start

Use the reliability-engineering skill to define SLIs for service latency and error rates.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: reliability-engineering
Download link: https://github.com/miles990/claude-software-skills/archive/main.zip#reliability-engineering

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.