tune
OfficialFind and fix GPU kernel performance bottlenecks.
Authortile-ai
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This guide provides a repeatable methodology to measure, analyze, and tune GPU kernel performance in TileOPs, ensuring reported metrics reflect GPU-only execution and that autotune trial runs do not contaminate profiler traces.
Core Features & Use Cases
- Authoritative Benchmarking: Run the benchmarks/ops bench_xxx scripts to obtain median GPU-only latencies for fair comparisons between kernel variants and configs.
- Clean Tracing Workflow: Disable autotune and fix the best config before profiling with nsys so aggregated kernel stats reflect steady-state behavior.
- Deep Metric Analysis: Use ncu (Nsight Compute) for per-metric inspection (memory throughput, L1/L2/HBM hit rates, occupancy, stall reasons) and apply practical TMPDIR and launch-skip/launch-count workarounds.
- Tuning SOP & PR Rules: Standardized workflow from benchmarking to profiling to PR submission, including required performance tables and autotune config disclosures.
Quick Start
Run the benchmark to collect median GPU-only latency, disable autotune to fix the config, and then run nsys or ncu to capture clean kernel traces for analysis.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: tune Download link: https://github.com/tile-ai/TileOPs/archive/main.zip#tune Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.