Name: fsdp
Availability: InStock
Author: tylertitsworth

System Documentation

What problem does it solve?

FSDP enables training of very large models by sharding parameters, gradients, and optimizer state across GPUs, reducing per-GPU memory and enabling models that don't fit on a single device.

Core Features & Use Cases

Sharding strategies (FULL_SHARD, SHARD_GRAD_OP, NO_SHARD) to tailor memory/compute tradeoffs for large-scale training.
Mixed precision, activation checkpointing, and CPU offload options to maximize memory savings and performance.
Integrations with HuggingFace Trainer or Accelerate for streamlined workflows and multi-node setups.

Quick Start

Configure a multi-GPU training run using FSDP and launch it with torchrun or Accelerate.

Please help me install this Skill: Name: fsdp Download link: https://github.com/tylertitsworth/skills/archive/main.zip#fsdp Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

fsdp

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper