uv-miles-rl-training
CommunityEnterprise RL for large-scale MoE training
Authoruv-xiao
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides a robust framework for training large-scale Mixture-of-Experts (MoE) models, addressing challenges like training stability, low-precision quantization, and train-inference alignment in enterprise environments.
Core Features & Use Cases
- Large MoE Training: Optimized for training models over 1TB, supporting DeepSeek V3 and Qwen3-MoE.
- Low-Precision Training: Enables FP8 and INT4 quantization-aware training for reduced memory footprint and increased throughput.
- Train-Inference Alignment: Ensures bit-wise identical alignment between training and inference using techniques like Rollout Routing Replay (R3).
- Speculative RL: Achieves up to 25%+ rollout speedup through speculative decoding.
- Use Case: Train a 1TB MoE model using FP8 quantization on H100 GPUs, ensuring bit-wise alignment with inference and maximizing throughput via speculative RL.
Quick Start
Use the uv-miles-rl-training skill to train a Qwen3-30B model with FP8 quantization and speculative RL enabled.
Dependency Matrix
Required Modules
sglang-router>=0.2.3raytorch>=2.0.0transformers>=4.40.0
Components
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: uv-miles-rl-training Download link: https://github.com/uv-xiao/pkbllm/archive/main.zip#uv-miles-rl-training Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.