model-architect
CommunityDesign transformer architectures.
AuthorRachasumanth
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill assists in designing transformer model architectures from scratch, ensuring the generated configurations align with specific computational, memory, and quality objectives.
Core Features & Use Cases
- Architecture Design: Generates Hugging Face Transformers-compatible
config.jsonfiles for decoder-only transformer models. - Scale Templates: Provides predefined architecture templates for various model scales (100M to 7B+ parameters).
- Component Selection: Recommends modern defaults like GQA, SwiGLU, RoPE, and RMSNorm, with explanations for deviations.
- Parameter & Memory Estimation: Calculates total parameters, optimizer state memory, activation memory, and checkpoint sizes.
- Tokenizer Integration: Validates and integrates tokenizer configurations (vocab size, special tokens) for seamless training.
- Use Case: A researcher needs to design a new LLM architecture for a specific research goal and budget. This skill helps them define the model's layers, hidden size, attention heads, and other parameters, providing a ready-to-use configuration file and detailed reports on its resource implications.
Quick Start
Use the model-architect skill to design a 1B class transformer model architecture.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: model-architect Download link: https://github.com/Rachasumanth/text2llm001/archive/main.zip#model-architect Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.