cuda-kernels

Official

Optimize NVIDIA GPU kernels for AI models.

Authorhuggingface
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides optimized CUDA kernels and integration guidance to significantly accelerate AI model inference and training on NVIDIA GPUs, reducing latency and improving throughput.

Core Features & Use Cases

  • Optimized CUDA Kernels: High-performance kernels for common operations like RMSNorm, GELU, and attention mechanisms.
  • Benchmarking Tools: Scripts to measure performance gains against baseline implementations.
  • Integration Guides: Detailed instructions for integrating custom kernels with HuggingFace diffusers and transformers libraries.
  • Use Case: Accelerate Stable Diffusion video generation by 6% or LLM inference by replacing standard PyTorch operations with custom, highly optimized CUDA code.

Quick Start

Benchmark the performance of optimized kernels against the baseline implementation for video generation.

Dependency Matrix

Required Modules

torchkernelsdiffuserstransformersaccelerate

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cuda-kernels
Download link: https://github.com/huggingface/kernels/archive/main.zip#cuda-kernels

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.