Skill Explorer

Searching protocol for "policy-optimization"

rloo

Community

Lower-variance RL with leave-one-out baselines.

Advanced

byatrawog

Return Policy Optimization

Community

Optimize returns for profit & CX

Advanced

bywassemgtk

Return Policy Optimization

Official

Optimize return policies for profit and satisfaction.

Advanced

bywriter

fine-tuning-with-trl

Community

Align LLMs with human preferences via RL.

Advanced

byDoanNgocCuong

fine-tuning-with-trl

Community

Align LLMs with human preferences using RL.

Advanced

byan8079

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

byGarrettRoi

fine-tuning-with-trl

Community

Align LLMs with human preferences via RL.

Advanced

byinformatico-madrid

fine-tuning-with-trl

Community

Align LLMs with human preferences.

Advanced

byMesferAli

apollo-caching-strategies

Official

Cache wisely with Apollo strategies.

Few Config

byTheBushidoCollective

grpo-rl-training

Community

GRPO/RL training patterns

Advanced

byovachiever

reward

Community

Train reward models for RLHF pipelines.

Advanced

byatrawog

rls-policy-optimizer

Community

Optimize RLS with select auth.uid() pattern

Advanced

bysanchezx1