plan-mode_arc_gsm8k_improvement
CommunityPlan-mode for ARC/GSM8K evaluation improvement.
Authorzapabob
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Plan mode identifies and fixes weaknesses in ARC-Challenge and GSM8K evaluations for the AEGIS model by analyzing timeout rates, extraction failures, data contamination, and seed stability.
Core Features & Use Cases
- ARC-Challenge improvement analysis: timeout rate, extraction failure analysis, response-pattern analysis, robust extraction implementation.
- GSM8K sanity checks: data contamination detection, multi-seed evaluation, zero-shot evaluation, scoring logic validation.
- Multi-objective evaluation workflow: parallel evaluations, statistical validation, comparative analysis, automated report generation.
- SO8T integration optimizations: existing ABC test integration, checkpoint management, resource optimization, automatic improvement proposals.
Quick Start
Run the ARC/GSM8K improvement Plan with your model path and seeds to generate analysis and immediately review and apply the top recommended improvements.
Dependency Matrix
Required Modules
None requiredComponents
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: plan-mode_arc_gsm8k_improvement Download link: https://github.com/zapabob/SO8T/archive/main.zip#plan-mode-arc-gsm8k-improvement Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.