Name: verl
Availability: InStock
Author: tylertitsworth

System Documentation

What problem does it solve?

Train RLHF LLMs at scale by coordinating rollout generation, policy updates, and reward signals in a unified framework.

Core Features & Use Cases

Supports PPO, GRPO, DAPO, RLOO, and REINFORCE++ with stable training loops.
Integrates vLLM/SGLang for fast rollout generation and FSDP or Megatron-LM for scalable training.
Includes an SFT trainer and end-to-end RLHF pipelines for production-grade experiments.
Provides multi-GPU scaling, reward-model integration, checkpointing, and monitoring via WandB.

Quick Start

Configure your training in a YAML file, point verl at your data, and run the trainer to start RLHF fine-tuning of your model.

Please help me install this Skill: Name: verl Download link: https://github.com/tylertitsworth/skills/archive/main.zip#verl Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

verl

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper