Skill Explorer

Searching protocol for "rloo"

rloo

Community

Lower-variance RL with leave-one-out baselines.

Advanced

byatrawog

reward

Community

Train reward models for RLHF pipelines.

Advanced

byatrawog

openrlhf-training

Community

Accelerate RLHF training for large models.

Advanced

byzhuangbiaowei

verl

Community

Scale RLHF for LLMs with Verl.

Advanced

bytylertitsworth

verl-rl-training

Community

Scale LLM RL training with flexible backends.

Advanced

byzhuangbiaowei

openrlhf-training

Community

Accelerate RLHF with Ray+vLLM

Advanced

byihatesea69

verl-rl-training

Community

Scale LLM RL training with verl.

Advanced

byihatesea69

verl-rl-training

Community

Scale LLM RL training with verl.

Advanced

byMesferAli

openrlhf-training

Community

Accelerate RLHF training for LLMs.

Advanced

byMesferAli

openrlhf-training

Community

Accelerate RLHF training for LLMs.

Advanced

byDoanNgocCuong

verl-rl-training

Community

Scale LLM RL training with flexible backends.

Advanced

bytianhao909

openrlhf-training

Community

Accelerate RLHF training for LLMs.

Advanced

bychoice5346