Name: uv-distributed-llm-pretraining-torchtitan
Availability: InStock
Author: uv-xiao

System Documentation

What problem does it solve?

This Skill addresses the significant challenge of pretraining large language models (LLMs) efficiently and at scale, enabling users to leverage advanced distributed training techniques without complex infrastructure setup.

Core Features & Use Cases

4D Parallelism: Supports Composable 4D parallelism (FSDP2, Tensor Parallelism, Pipeline Parallelism, Context Parallelism) for maximum throughput and memory efficiency.
Optimized Training: Integrates features like Float8 precision, torch.compile, and distributed checkpointing for faster training and reduced resource consumption.
Use Case: A research team wants to pretrain a new 70B parameter LLM on a cluster of 256 GPUs. They can use this Skill to configure and launch the training job, leveraging its advanced parallelism and optimization techniques to complete the pretraining phase significantly faster than traditional methods.

Quick Start

Launch Llama 3.1 8B pretraining on 8 GPUs using the provided configuration file.

Please help me install this Skill: Name: uv-distributed-llm-pretraining-torchtitan Download link: https://github.com/uv-xiao/pkbllm/archive/main.zip#uv-distributed-llm-pretraining-torchtitan Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

uv-distributed-llm-pretraining-torchtitan

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper