GPU Optimization Patterns

Community

Boost GPU performance and efficiency.

AuthorHermeticOrmus
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the critical need to optimize GPU resource utilization, reduce memory consumption, and accelerate model training and inference for deep learning workloads.

Core Features & Use Cases

  • Memory Management: Profile and estimate GPU memory usage for models and optimizers.
  • Compilation Strategies: Apply torch.compile with various modes (default, reduce-overhead, max-autotune) for performance gains.
  • Profiling: Utilize torch.profiler to identify bottlenecks in training steps.
  • Mixed Precision: Implement BF16 and FP16 training with autocast and GradScaler.
  • Quantization: Load models in 4-bit precision using bitsandbytes for reduced memory footprint.
  • DataLoader Optimization: Configure DataLoader for maximum GPU throughput.
  • Use Case: You're training a large language model and hitting GPU memory limits. This Skill helps you profile memory, switch to BF16 mixed precision, and optimize your DataLoader to fit the model and train faster.

Quick Start

Use the gpu-optimization-patterns skill to profile the memory usage of your PyTorch model.

Dependency Matrix

Required Modules

torchtransformersbitsandbytes

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: GPU Optimization Patterns
Download link: https://github.com/HermeticOrmus/LibreMLOps-Claude-Code/archive/main.zip#gpu-optimization-patterns

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.