monitoring-alerting

Community

Keep your systems healthy, get alerted instantly.

Authorricardoroche
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill simplifies the complex task of setting up robust monitoring and alerting for applications. It helps ensure system reliability by providing patterns for metric instrumentation, defining Service Level Objectives (SLOs), and configuring timely alerts, preventing outages and reducing manual incident response.

Core Features & Use Cases

  • Metric Instrumentation: Guides on using Prometheus client for counters, histograms, and gauges, including FastAPI middleware for automatic metric tracking.
  • SLO/SLI Definition: Provides structured patterns for defining Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with Prometheus queries.
  • Alerting & Dashboards: Offers patterns for defining alert rules, sending notifications (Slack, PagerDuty), and configuring Grafana dashboards for visualization.
  • Use Case: A DevOps engineer needs to ensure their new microservice meets a 99.9% availability target. This skill helps them instrument HTTP requests, define an SLO for success rate, create an alert if the error budget is exhausted, and build a dashboard to track performance.

Quick Start

Help me instrument my FastAPI application with Prometheus metrics for request rate and latency.

Dependency Matrix

Required Modules

prometheus_clientfastapihttpxpydantic

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: monitoring-alerting
Download link: https://github.com/ricardoroche/ricardos-claude-code/archive/main.zip#monitoring-alerting

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository