verl
CommunityScale RLHF for LLMs with Verl.
Authortylertitsworth
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Train RLHF LLMs at scale by coordinating rollout generation, policy updates, and reward signals in a unified framework.
Core Features & Use Cases
- Supports PPO, GRPO, DAPO, RLOO, and REINFORCE++ with stable training loops.
- Integrates vLLM/SGLang for fast rollout generation and FSDP or Megatron-LM for scalable training.
- Includes an SFT trainer and end-to-end RLHF pipelines for production-grade experiments.
- Provides multi-GPU scaling, reward-model integration, checkpointing, and monitoring via WandB.
Quick Start
Configure your training in a YAML file, point verl at your data, and run the trainer to start RLHF fine-tuning of your model.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: verl Download link: https://github.com/tylertitsworth/skills/archive/main.zip#verl Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.