training-llms-megatron

Community

Scale LLM training with advanced parallelism.

AuthorDoanNgocCuong
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the immense computational and memory challenges of training large language models (LLMs) by providing a robust framework for distributed training with advanced parallelism strategies.

Core Features & Use Cases

  • Massive Model Training: Train models from 2 billion to over 400 billion parameters.
  • GPU Efficiency: Achieves high Model FLOP Utilization (MFU), up to 47% on H100 GPUs, maximizing hardware investment.
  • Advanced Parallelism: Implements Tensor Parallelism (TP), Pipeline Parallelism (PP), Sequence Parallelism (SP), Context Parallelism (CP), and Expert Parallelism (EP) for optimal scaling.
  • Use Case: Train a 70B parameter LLaMA-style model on a cluster of 64 H100 GPUs using a combination of TP, PP, and DP to achieve maximum training throughput and efficiency.

Quick Start

Use the training-llms-megatron skill to train a LLaMA-style model with 3D parallelism on 64 GPUs.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: training-llms-megatron
Download link: https://github.com/DoanNgocCuong/continuous-training-pipeline_T3_2026/archive/main.zip#training-llms-megatron

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.