sparse-autoencoder-training

Community

Decompose activations into interpretable features.

AuthorAXGZ21
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the challenge of polysemanticity in neural networks, where individual neurons represent multiple concepts, making interpretation difficult. It provides tools to train and analyze Sparse Autoencoders (SAEs) that decompose these dense activations into sparse, monosemantic features.

Core Features & Use Cases

  • Feature Discovery: Identify interpretable concepts learned by language models.
  • Superposition Analysis: Study how models represent multiple features within single neurons.
  • Mechanistic Interpretability: Understand the internal workings of neural networks.
  • Use Case: When analyzing a language model's response to a specific prompt, use this Skill to discover which learned features (e.g., sentiment, topic, grammatical structure) are most active and how they contribute to the output.

Quick Start

Use the saelens skill to load a pre-trained SAE for GPT-2 small and encode model activations.

Dependency Matrix

Required Modules

sae-lenstransformer-lenstorch

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: sparse-autoencoder-training
Download link: https://github.com/AXGZ21/hermes-agent-railway/archive/main.zip#sparse-autoencoder-training

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.