openrlhf-training
CommunityAccelerate RLHF training for LLMs.
AuthorDoanNgocCuong
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill streamlines the complex and resource-intensive process of Reinforcement Learning from Human Feedback (RLHF) for large language models, making advanced model alignment more accessible.
Core Features & Use Cases
- High-Performance RLHF: Supports PPO, GRPO, RLOO, DPO algorithms with Ray and vLLM acceleration.
- Large Model Training: Optimized for models from 7B to 70B+ parameters.
- Distributed Architecture: Built on Ray for efficient multi-node, multi-GPU training.
- Use Case: Fine-tune a large language model like Llama-3-8B using PPO to align its responses with human preferences, significantly improving its helpfulness and safety.
Quick Start
Use the openrlhf-training skill to start PPO training for a Llama-3-8B model using 8 GPUs.
Dependency Matrix
Required Modules
openrlhfrayvllmtorchtransformersdeepspeed
Components
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: openrlhf-training Download link: https://github.com/DoanNgocCuong/continuous-training-pipeline_T3_2026/archive/main.zip#openrlhf-training Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.