model-architect

Name: model-architect
Availability: InStock
Author: Rachasumanth

Community

Design transformer architectures.

Software Engineering #architecture #llm #transformer #config.json #hugging face #model design

AuthorRachasumanth

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill assists in designing transformer model architectures from scratch, ensuring the generated configurations align with specific computational, memory, and quality objectives.

Core Features & Use Cases

Architecture Design: Generates Hugging Face Transformers-compatible config.json files for decoder-only transformer models.
Scale Templates: Provides predefined architecture templates for various model scales (100M to 7B+ parameters).
Component Selection: Recommends modern defaults like GQA, SwiGLU, RoPE, and RMSNorm, with explanations for deviations.
Parameter & Memory Estimation: Calculates total parameters, optimizer state memory, activation memory, and checkpoint sizes.
Tokenizer Integration: Validates and integrates tokenizer configurations (vocab size, special tokens) for seamless training.
Use Case: A researcher needs to design a new LLM architecture for a specific research goal and budget. This skill helps them define the model's layers, hidden size, attention heads, and other parameters, providing a ready-to-use configuration file and detailed reports on its resource implications.

Quick Start

Use the model-architect skill to design a 1B class transformer model architecture.

model-architect

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper