performance-analysis

Official

Analyze MaxText training performance end-to-end.

AuthorAMD-AGI
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Post-training performance analysis of MaxText training jobs to identify bottlenecks, efficiency issues, and resource contention across GPU, host, and network layers using tgs_tagger, TraceLens, and IRLens.

Core Features & Use Cases

  • Multi-tool workflow: TSDB-based comparisons, TraceLens performance reports, and IRLens analysis to pinpoint root causes.
  • Actionable results: Generate structured metrics, GPU/utilization breakdowns, and kernel-level insights.
  • Guided steps: Read results, summarize findings, and validate dashboard availability.

Quick Start

Run the analysis workflow on a completed job's artifacts to generate analysis.json and TraceLens reports for review.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: performance-analysis
Download link: https://github.com/AMD-AGI/maxtext-slurm/archive/main.zip#performance-analysis

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.