Name: unsloth-grpo
Availability: InStock
Author: cuba6112

System Documentation

What problem does it solve?

This Skill addresses the significant memory constraints encountered when training large language models for reasoning tasks, particularly those requiring long context lengths.

Core Features & Use Cases

Memory Optimization: Achieves up to 8x memory savings during training using Group Relative Policy Optimization (GRPO).
Long-Context RL: Enables training of models with context lengths up to 20K tokens on single GPUs.
Use Case: Train a DeepSeek-R1 style reasoning model for complex math or code generation tasks on a single GPU with limited VRAM, leveraging RLVR for verifiable rewards.

Quick Start

Use the unsloth-grpo skill to train a reasoning model with long context on a single GPU.

Please help me install this Skill: Name: unsloth-grpo Download link: https://github.com/cuba6112/skillfactory/archive/main.zip#unsloth-grpo Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

unsloth-grpo

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper