gpu-optimizer
CommunityMaximize GPU throughput & prevent OOMs
AuthorMathews-Tom
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Provides evidence-based optimization patterns to reduce out-of-memory errors, improve training throughput, and migrate CPU-bound workflows to NVIDIA GPUs for consumer-class cards (8–24GB VRAM).
Core Features & Use Cases
- XGBoost GPU acceleration: QuantileDMatrix, gpu_hist, and device-aware training for faster boosting on CUDA-enabled builds.
- PyTorch mixed precision & compilation: BF16/FP16 selection, GradScaler fallbacks, fused optimizers, and torch.compile guidance to boost performance.
- VRAM management & diagnostics: Gradient checkpointing, accumulation strategies, peak memory monitoring, and profiling to locate bottlenecks.
- CuPy / cuDF migrations: Patterns for NumPy→CuPy and Pandas→cuDF transitions including zero-copy interop with PyTorch.
- Practical trade-offs & anti-patterns: Clear guidance on when to apply each pattern and what to avoid to preserve GPU pipeline efficiency.
Quick Start
Optimize your model training on an NVIDIA GPU with 8–24GB VRAM by enabling BF16 or FP16, turning on gradient checkpointing, and running the diagnostics to validate memory and performance.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gpu-optimizer Download link: https://github.com/Mathews-Tom/praxis-skills/archive/main.zip#gpu-optimizer Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.