Name: logprob-prefill-analysis
Availability: InStock
Author: EleutherAI

System Documentation

What problem does it solve?

Documents and executes the full prefill sensitivity analysis workflow to assess model susceptibility to reward hacking via exploit-oriented prefills.

Core Features & Use Cases

End-to-end pipeline for token-based and logprob-based metrics, including trajectory analysis, KL divergence, and extrapolation across checkpoints.
Useful for evaluating how different prefill levels influence exploitability and comparing across exploit types.
Case example: run evaluation across checkpoints to determine when a model becomes easily exploitable and compare metrics to identify early warning signals.

Quick Start

Run the full prefill sensitivity evaluation against your checkpoints using the provided scripts.

Please help me install this Skill: Name: logprob-prefill-analysis Download link: https://github.com/EleutherAI/rh-indicators/archive/main.zip#logprob-prefill-analysis Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

logprob-prefill-analysis

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper