uv-distributed-llm-pretraining-torchtitan

Community

Scale LLM pretraining with PyTorch.

Authoruv-xiao
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the significant challenge of pretraining large language models (LLMs) efficiently and at scale, enabling users to leverage advanced distributed training techniques without complex infrastructure setup.

Core Features & Use Cases

  • 4D Parallelism: Supports Composable 4D parallelism (FSDP2, Tensor Parallelism, Pipeline Parallelism, Context Parallelism) for maximum throughput and memory efficiency.
  • Optimized Training: Integrates features like Float8 precision, torch.compile, and distributed checkpointing for faster training and reduced resource consumption.
  • Use Case: A research team wants to pretrain a new 70B parameter LLM on a cluster of 256 GPUs. They can use this Skill to configure and launch the training job, leveraging its advanced parallelism and optimization techniques to complete the pretraining phase significantly faster than traditional methods.

Quick Start

Launch Llama 3.1 8B pretraining on 8 GPUs using the provided configuration file.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: uv-distributed-llm-pretraining-torchtitan
Download link: https://github.com/uv-xiao/pkbllm/archive/main.zip#uv-distributed-llm-pretraining-torchtitan

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.