verl

Community

Scale RLHF for LLMs with Verl.

Authortylertitsworth
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Train RLHF LLMs at scale by coordinating rollout generation, policy updates, and reward signals in a unified framework.

Core Features & Use Cases

  • Supports PPO, GRPO, DAPO, RLOO, and REINFORCE++ with stable training loops.
  • Integrates vLLM/SGLang for fast rollout generation and FSDP or Megatron-LM for scalable training.
  • Includes an SFT trainer and end-to-end RLHF pipelines for production-grade experiments.
  • Provides multi-GPU scaling, reward-model integration, checkpointing, and monitoring via WandB.

Quick Start

Configure your training in a YAML file, point verl at your data, and run the trainer to start RLHF fine-tuning of your model.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: verl
Download link: https://github.com/tylertitsworth/skills/archive/main.zip#verl

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.