fsdp

Community

Scale large models with PyTorch FSDP.

Authortylertitsworth
Version1.0.0
Installs0

System Documentation

What problem does it solve?

FSDP enables training of very large models by sharding parameters, gradients, and optimizer state across GPUs, reducing per-GPU memory and enabling models that don't fit on a single device.

Core Features & Use Cases

  • Sharding strategies (FULL_SHARD, SHARD_GRAD_OP, NO_SHARD) to tailor memory/compute tradeoffs for large-scale training.
  • Mixed precision, activation checkpointing, and CPU offload options to maximize memory savings and performance.
  • Integrations with HuggingFace Trainer or Accelerate for streamlined workflows and multi-node setups.

Quick Start

Configure a multi-GPU training run using FSDP and launch it with torchrun or Accelerate.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: fsdp
Download link: https://github.com/tylertitsworth/skills/archive/main.zip#fsdp

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.