kernel-trace-analysis

Community

Profile and optimize GPU kernels.

Authorfsx950223
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps identify and resolve performance bottlenecks in GPU kernels by profiling their execution, analyzing instruction traces, and providing actionable optimization suggestions.

Core Features & Use Cases

  • Kernel Profiling: Collect detailed performance statistics and instruction-level traces for GPU kernels using rocprofv3.
  • Bottleneck Identification: Pinpoint performance issues such as barrier stalls, idle cycles, and memory-bound operations.
  • Optimization Planning: Generate a prioritized plan with concrete steps to improve kernel performance.
  • Use Case: A developer has a slow GPU kernel and needs to understand why. They use this Skill to generate a trace, analyze the results, and receive a report detailing the exact instructions causing stalls and how to fix them.

Quick Start

Use the kernel-trace-analysis skill to profile the command 'python bench_pa.py --batch 32'.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: kernel-trace-analysis
Download link: https://github.com/fsx950223/claude-stuff/archive/main.zip#kernel-trace-analysis

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.