sparse-autoencoder-training
CommunityDecompose activations into interpretable features.
Education & Research#transformer models#feature discovery#sparse autoencoders#mechanistic interpretability#neural network analysis#saelens
AuthorAXGZ21
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the challenge of polysemanticity in neural networks, where individual neurons represent multiple concepts, making interpretation difficult. It provides tools to train and analyze Sparse Autoencoders (SAEs) that decompose these dense activations into sparse, monosemantic features.
Core Features & Use Cases
- Feature Discovery: Identify interpretable concepts learned by language models.
- Superposition Analysis: Study how models represent multiple features within single neurons.
- Mechanistic Interpretability: Understand the internal workings of neural networks.
- Use Case: When analyzing a language model's response to a specific prompt, use this Skill to discover which learned features (e.g., sentiment, topic, grammatical structure) are most active and how they contribute to the output.
Quick Start
Use the saelens skill to load a pre-trained SAE for GPT-2 small and encode model activations.
Dependency Matrix
Required Modules
sae-lenstransformer-lenstorch
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: sparse-autoencoder-training Download link: https://github.com/AXGZ21/hermes-agent-railway/archive/main.zip#sparse-autoencoder-training Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.