debug-cuda-crash
CommunityDebug CUDA crashes with API logging.
Authorariusewy
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps you diagnose and fix CUDA crashes in your FlashInfer applications by providing detailed logging of API calls and tensor states before a crash occurs.
Core Features & Use Cases
- API Logging: Captures input tensors, shapes, dtypes, and values for debugging.
- Error Analysis: Helps identify shape mismatches, numerical issues (NaN/Inf), and memory access errors.
- Multi-Process Debugging: Supports logging for distributed training setups.
- Use Case: When your PyTorch model using FlashInfer suddenly crashes with a CUDA error, you can enable API logging to see exactly which tensor inputs caused the illegal memory access or NaN output, allowing for a quick fix.
Quick Start
Enable detailed API logging by setting the environment variables FLASHINFER_LOGLEVEL to 3 and FLASHINFER_LOGDEST to debug.log, then run your Python script.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: debug-cuda-crash Download link: https://github.com/ariusewy/flashinfer_dev/archive/main.zip#debug-cuda-crash Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.