tune

Official

Find and fix GPU kernel performance bottlenecks.

Authortile-ai
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This guide provides a repeatable methodology to measure, analyze, and tune GPU kernel performance in TileOPs, ensuring reported metrics reflect GPU-only execution and that autotune trial runs do not contaminate profiler traces.

Core Features & Use Cases

  • Authoritative Benchmarking: Run the benchmarks/ops bench_xxx scripts to obtain median GPU-only latencies for fair comparisons between kernel variants and configs.
  • Clean Tracing Workflow: Disable autotune and fix the best config before profiling with nsys so aggregated kernel stats reflect steady-state behavior.
  • Deep Metric Analysis: Use ncu (Nsight Compute) for per-metric inspection (memory throughput, L1/L2/HBM hit rates, occupancy, stall reasons) and apply practical TMPDIR and launch-skip/launch-count workarounds.
  • Tuning SOP & PR Rules: Standardized workflow from benchmarking to profiling to PR submission, including required performance tables and autotune config disclosures.

Quick Start

Run the benchmark to collect median GPU-only latency, disable autotune to fix the config, and then run nsys or ncu to capture clean kernel traces for analysis.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: tune
Download link: https://github.com/tile-ai/TileOPs/archive/main.zip#tune

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.