fsdp
CommunityScale large models with PyTorch FSDP.
Authortylertitsworth
Version1.0.0
Installs0
System Documentation
What problem does it solve?
FSDP enables training of very large models by sharding parameters, gradients, and optimizer state across GPUs, reducing per-GPU memory and enabling models that don't fit on a single device.
Core Features & Use Cases
- Sharding strategies (FULL_SHARD, SHARD_GRAD_OP, NO_SHARD) to tailor memory/compute tradeoffs for large-scale training.
- Mixed precision, activation checkpointing, and CPU offload options to maximize memory savings and performance.
- Integrations with HuggingFace Trainer or Accelerate for streamlined workflows and multi-node setups.
Quick Start
Configure a multi-GPU training run using FSDP and launch it with torchrun or Accelerate.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: fsdp Download link: https://github.com/tylertitsworth/skills/archive/main.zip#fsdp Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.