training-llms-megatron
CommunityScale LLM training with advanced parallelism.
Software Engineering#distributed training#megatron-core#llm training#h100#gpu efficiency#model parallelism
AuthorDoanNgocCuong
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the immense computational and memory challenges of training large language models (LLMs) by providing a robust framework for distributed training with advanced parallelism strategies.
Core Features & Use Cases
- Massive Model Training: Train models from 2 billion to over 400 billion parameters.
- GPU Efficiency: Achieves high Model FLOP Utilization (MFU), up to 47% on H100 GPUs, maximizing hardware investment.
- Advanced Parallelism: Implements Tensor Parallelism (TP), Pipeline Parallelism (PP), Sequence Parallelism (SP), Context Parallelism (CP), and Expert Parallelism (EP) for optimal scaling.
- Use Case: Train a 70B parameter LLaMA-style model on a cluster of 64 H100 GPUs using a combination of TP, PP, and DP to achieve maximum training throughput and efficiency.
Quick Start
Use the training-llms-megatron skill to train a LLaMA-style model with 3D parallelism on 64 GPUs.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: training-llms-megatron Download link: https://github.com/DoanNgocCuong/continuous-training-pipeline_T3_2026/archive/main.zip#training-llms-megatron Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.