GPU Optimization Patterns
CommunityBoost GPU performance and efficiency.
AuthorHermeticOrmus
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the critical need to optimize GPU resource utilization, reduce memory consumption, and accelerate model training and inference for deep learning workloads.
Core Features & Use Cases
- Memory Management: Profile and estimate GPU memory usage for models and optimizers.
- Compilation Strategies: Apply
torch.compilewith various modes (default,reduce-overhead,max-autotune) for performance gains. - Profiling: Utilize
torch.profilerto identify bottlenecks in training steps. - Mixed Precision: Implement BF16 and FP16 training with
autocastandGradScaler. - Quantization: Load models in 4-bit precision using
bitsandbytesfor reduced memory footprint. - DataLoader Optimization: Configure
DataLoaderfor maximum GPU throughput. - Use Case: You're training a large language model and hitting GPU memory limits. This Skill helps you profile memory, switch to BF16 mixed precision, and optimize your DataLoader to fit the model and train faster.
Quick Start
Use the gpu-optimization-patterns skill to profile the memory usage of your PyTorch model.
Dependency Matrix
Required Modules
torchtransformersbitsandbytes
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: GPU Optimization Patterns Download link: https://github.com/HermeticOrmus/LibreMLOps-Claude-Code/archive/main.zip#gpu-optimization-patterns Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.