tune-multiplication

Official

Maximize matrix multiplication performance on GPUs

Authortile-ai
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides a compact, reproducible standard operating procedure for profiling and optimizing matrix-related operators (GEMV, GEMM, Grouped GEMM) to reach high bandwidth or TFLOPS utilization on NVIDIA Ampere/Hopper GPUs by identifying memory vs compute bottlenecks, enforcing warp-level coalescing, and selecting hardware-aligned tile and pipeline parameters.

Core Features & Use Cases

  • Bottleneck classification using arithmetic intensity to decide memory-bound versus compute-bound strategies.
  • Memory-bound guidance: warp coalescing rules, shared memory reuse patterns, reduce-thread sizing, and Hopper-specific tips (TMA, cp.async, L2 behavior).
  • Compute-bound guidance: tensor-core alignment constraints, shared-memory budgeting, and double-buffered pipelining recommendations.
  • Autotune strategy and workflow: recommended search spaces, best practices for warmup/reps, logging, validation, and comparison to cuBLAS/torch baselines.
  • Case studies demonstrating real-world gains (e.g., high H200 utilization on large Llama-3 shapes) and pitfalls to avoid.
  • Use case example: tune kernels for a production inference workload (large sequence GEMVs) to recover bandwidth and reduce latency.

Quick Start

Use the tune-multiplication guide to compute arithmetic intensity, verify warp-level memory coalescing, design a hardware-aligned autotune search space, and run the autotuner to record the best GEMV/GEMM configuration.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: tune-multiplication
Download link: https://github.com/tile-ai/TileOPs/archive/main.zip#tune-multiplication

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.