Error Handling — Production Best Practices
CommunityBuild resilient systems that fail gracefully.
Software Engineering#error handling#api design#mlops#resilience#observability#distributed systems#production systems
AuthorDoanNgocCuong
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides comprehensive strategies and best practices for implementing robust error handling in production systems, ensuring reliability, debuggability, and graceful failure.
Core Features & Use Cases
- Error Classification: Differentiate between transient and permanent errors to implement appropriate retry or failure logic.
- Standardized Responses: Utilize RFC 7807 for consistent and informative API error responses.
- Graceful Recovery: Implement patterns like Sagas and idempotency to maintain data consistency in distributed systems.
- Observability: Ensure all errors are logged, traced, and alerted upon effectively.
- Use Case: A critical e-commerce service experiences intermittent network timeouts. This Skill helps implement retries for these transient errors, preventing order failures and improving user experience, while also ensuring permanent errors like invalid payment details are immediately rejected.
Quick Start
Implement robust error handling by classifying errors, using standardized response formats, and ensuring observability.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: Error Handling — Production Best Practices Download link: https://github.com/DoanNgocCuong/working/archive/main.zip#error-handling-production-best-practices Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.