tune-multiplication
OfficialMaximize matrix multiplication performance on GPUs
System Documentation
What problem does it solve?
This Skill provides a compact, reproducible standard operating procedure for profiling and optimizing matrix-related operators (GEMV, GEMM, Grouped GEMM) to reach high bandwidth or TFLOPS utilization on NVIDIA Ampere/Hopper GPUs by identifying memory vs compute bottlenecks, enforcing warp-level coalescing, and selecting hardware-aligned tile and pipeline parameters.
Core Features & Use Cases
- Bottleneck classification using arithmetic intensity to decide memory-bound versus compute-bound strategies.
- Memory-bound guidance: warp coalescing rules, shared memory reuse patterns, reduce-thread sizing, and Hopper-specific tips (TMA, cp.async, L2 behavior).
- Compute-bound guidance: tensor-core alignment constraints, shared-memory budgeting, and double-buffered pipelining recommendations.
- Autotune strategy and workflow: recommended search spaces, best practices for warmup/reps, logging, validation, and comparison to cuBLAS/torch baselines.
- Case studies demonstrating real-world gains (e.g., high H200 utilization on large Llama-3 shapes) and pitfalls to avoid.
- Use case example: tune kernels for a production inference workload (large sequence GEMVs) to recover bandwidth and reduce latency.
Quick Start
Use the tune-multiplication guide to compute arithmetic intensity, verify warp-level memory coalescing, design a hardware-aligned autotune search space, and run the autotuner to record the best GEMV/GEMM configuration.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: tune-multiplication Download link: https://github.com/tile-ai/TileOPs/archive/main.zip#tune-multiplication Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.