obliteratus
CommunityUncensor LLMs with mechanistic interpretability.
Software Engineering#llm#mechanistic interpretability#uncensor#refusal removal#model surgery#weight projection
AuthorAum08Desai
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill removes unwanted refusal behaviors and guardrails from open-weight Large Language Models (LLMs) without requiring retraining or fine-tuning, preserving their reasoning capabilities.
Core Features & Use Cases
- Refusal Removal: Excises guardrails using advanced mechanistic interpretability techniques (SVD, LEACE, SAE, etc.).
- Model Surgery: Identifies and surgically removes specific refusal directions from model weights.
- Use Case: You have a base LLM like Llama 3 that refuses to answer certain prompts due to safety guardrails. Use this Skill to create a version of the model that can answer those prompts without compromising its general intelligence.
Quick Start
Use the obliteratus skill to remove refusal behaviors from the 'meta-llama/Llama-3.1-8B-Instruct' model.
Dependency Matrix
Required Modules
obliteratustorchtransformersbitsandbytesacceleratesafetensors
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: obliteratus Download link: https://github.com/Aum08Desai/hermes-research-agent/archive/main.zip#obliteratus Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.