corpus-investigation
CommunityToken-efficient corpus investigations at scale.
Authorpercy-raskova
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill enables systematic, reproducible investigation of very large document corpora (100GB+) using a token-efficient methodology. It outputs structured section analyses, metadata schemas, and RAG guidance to help design scalable knowledge bases without exhaustive reading.
Core Features & Use Cases
- 5-phase investigation framework (Reconnaissance, Stratified Sampling, Pattern Verification, Edge Case Analysis, Synthesis) to produce a complete Section Analysis Document.
- Stratified sampling across size, time, type, and depth to minimize token usage while preserving representative patterns.
- Computational pattern verification using grep, find, and other shell tools to quantify coverage across thousands of files without exhaustive reading.
- Output-ready Section Analysis Document with 5-layer metadata schema and RAG integration recommendations for teams building large-scale knowledge bases.
Quick Start
To begin, say: "Investigate the corpus at /path/to/corpus/". Claude will activate this skill and return a comprehensive Section Analysis Document with reproducible commands and patterns.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: corpus-investigation Download link: https://github.com/percy-raskova/marxists.org-rag-db/archive/main.zip#corpus-investigation Please download this .zip file, extract it, and install it in the .claude/skills/ directory.