uv-distributed-llm-pretraining-torchtitan
CommunityScale LLM pretraining with PyTorch.
Authoruv-xiao
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the significant challenge of pretraining large language models (LLMs) efficiently and at scale, enabling users to leverage advanced distributed training techniques without complex infrastructure setup.
Core Features & Use Cases
- 4D Parallelism: Supports Composable 4D parallelism (FSDP2, Tensor Parallelism, Pipeline Parallelism, Context Parallelism) for maximum throughput and memory efficiency.
- Optimized Training: Integrates features like Float8 precision,
torch.compile, and distributed checkpointing for faster training and reduced resource consumption. - Use Case: A research team wants to pretrain a new 70B parameter LLM on a cluster of 256 GPUs. They can use this Skill to configure and launch the training job, leveraging its advanced parallelism and optimization techniques to complete the pretraining phase significantly faster than traditional methods.
Quick Start
Launch Llama 3.1 8B pretraining on 8 GPUs using the provided configuration file.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: uv-distributed-llm-pretraining-torchtitan Download link: https://github.com/uv-xiao/pkbllm/archive/main.zip#uv-distributed-llm-pretraining-torchtitan Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.