gpu-optimizer

Community

Maximize GPU throughput & prevent OOMs

AuthorMathews-Tom
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Provides evidence-based optimization patterns to reduce out-of-memory errors, improve training throughput, and migrate CPU-bound workflows to NVIDIA GPUs for consumer-class cards (8–24GB VRAM).

Core Features & Use Cases

  • XGBoost GPU acceleration: QuantileDMatrix, gpu_hist, and device-aware training for faster boosting on CUDA-enabled builds.
  • PyTorch mixed precision & compilation: BF16/FP16 selection, GradScaler fallbacks, fused optimizers, and torch.compile guidance to boost performance.
  • VRAM management & diagnostics: Gradient checkpointing, accumulation strategies, peak memory monitoring, and profiling to locate bottlenecks.
  • CuPy / cuDF migrations: Patterns for NumPy→CuPy and Pandas→cuDF transitions including zero-copy interop with PyTorch.
  • Practical trade-offs & anti-patterns: Clear guidance on when to apply each pattern and what to avoid to preserve GPU pipeline efficiency.

Quick Start

Optimize your model training on an NVIDIA GPU with 8–24GB VRAM by enabling BF16 or FP16, turning on gradient checkpointing, and running the diagnostics to validate memory and performance.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: gpu-optimizer
Download link: https://github.com/Mathews-Tom/praxis-skills/archive/main.zip#gpu-optimizer

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.