debug-cuda-crash

Community

Debug CUDA crashes with API logging.

Authorariusewy
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps you diagnose and fix CUDA crashes in your FlashInfer applications by providing detailed logging of API calls and tensor states before a crash occurs.

Core Features & Use Cases

  • API Logging: Captures input tensors, shapes, dtypes, and values for debugging.
  • Error Analysis: Helps identify shape mismatches, numerical issues (NaN/Inf), and memory access errors.
  • Multi-Process Debugging: Supports logging for distributed training setups.
  • Use Case: When your PyTorch model using FlashInfer suddenly crashes with a CUDA error, you can enable API logging to see exactly which tensor inputs caused the illegal memory access or NaN output, allowing for a quick fix.

Quick Start

Enable detailed API logging by setting the environment variables FLASHINFER_LOGLEVEL to 3 and FLASHINFER_LOGDEST to debug.log, then run your Python script.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: debug-cuda-crash
Download link: https://github.com/ariusewy/flashinfer_dev/archive/main.zip#debug-cuda-crash

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.