cuda-kernels
OfficialOptimize NVIDIA GPU kernels for AI models.
Authorhuggingface
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides optimized CUDA kernels and integration guidance to significantly accelerate AI model inference and training on NVIDIA GPUs, reducing latency and improving throughput.
Core Features & Use Cases
- Optimized CUDA Kernels: High-performance kernels for common operations like RMSNorm, GELU, and attention mechanisms.
- Benchmarking Tools: Scripts to measure performance gains against baseline implementations.
- Integration Guides: Detailed instructions for integrating custom kernels with HuggingFace
diffusersandtransformerslibraries. - Use Case: Accelerate Stable Diffusion video generation by 6% or LLM inference by replacing standard PyTorch operations with custom, highly optimized CUDA code.
Quick Start
Benchmark the performance of optimized kernels against the baseline implementation for video generation.
Dependency Matrix
Required Modules
torchkernelsdiffuserstransformersaccelerate
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: cuda-kernels Download link: https://github.com/huggingface/kernels/archive/main.zip#cuda-kernels Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.