ml-debug
OfficialDebug ML failures with precision.
Software Engineering#performance optimization#oom error#ml debugging#ai troubleshooting#nan values#framework configuration
AuthorLeeroo-AI
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill systematically diagnoses and resolves failures in ML/AI workflows, such as Out-of-Memory (OOM) errors, NaN values, divergence, crashes, poor throughput, incorrect outputs, and dependency conflicts, by leveraging framework-specific knowledge and grounding in documentation.
Core Features & Use Cases
- Root Cause Analysis: Identifies the underlying cause of ML failures through systematic diagnosis.
- Framework-Specific Debugging: Utilizes knowledge bases and web fetching to provide accurate, context-aware solutions for various ML frameworks (PyTorch, DeepSpeed, vLLM, Hugging Face Transformers, etc.).
- Guided Fixes: Provides step-by-step instructions, including specific configuration changes, code patches, and verification scripts, to resolve identified issues.
- Prevention Strategies: Offers actionable advice and runnable guardrails to prevent similar issues in the future.
- Use Case: When a distributed training job fails with an OOM error on a specific GPU, this Skill can pinpoint whether it's due to activation memory, optimizer states, or KV cache, and provide a precise configuration adjustment to fix it.
Quick Start
Use the ml-debug skill to diagnose and fix an OOM error encountered during LLM fine-tuning.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: ml-debug Download link: https://github.com/Leeroo-AI/superml/archive/main.zip#ml-debug Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.