Skill Explorer

Searching protocol for "reward-modeling"

reward

Community

Train reward models for RLHF pipelines.

Advanced

byatrawog

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

byAum08Desai

rlhf

Community

Align language models with human feedback.

Advanced

byitsmostafa

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

byMesferAli

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

byihatesea69

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

bykwasi-cpu

fine-tuning-with-trl

Community

Align LLMs with human preferences via RL.

Advanced

byDoanNgocCuong

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

byhochoa13

fine-tuning-with-trl

Community

Align LLMs with human preferences using RL.

Advanced

byan8079

model-trainer

Community

Train LLMs in the cloud with TRL on HF Jobs.

Advanced

byNymbo

fine-tuning-with-trl

Community

Align LLMs with human preferences using RL.

Advanced

byzhuangbiaowei

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

byGarrettRoi