collab-evals

Name: collab-evals
Availability: InStock
Author: Kbediako

Community

Run collab evals and capture manifest evidence.

Data & Analytics #evaluation #multi-agent #evidence #manifest #RLM #pause-resume #collab

AuthorKbediako

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Collab-evals provide a framework to run repeatable multi-agent evaluation scenarios (symbolic RLM, large-context interactions) and to preserve evidence via manifest-backed outputs, reducing ad-hoc experimentation and enabling audit trails.

Core Features & Use Cases

Orchestrates collab-driven evaluations across multi-agent workflows including symbolic RLM and large-context tests.
Supports pause/resume, long-running experiments, and checkpointing for resilience.
Generates manifest-backed evidence and updates documentation with findings for traceability and reproducibility.

Quick Start

Pick the scenario(s) for evaluation:

Large-context symbolic RLM with collab subcalls.
Multi-hour refactor with checkpoints.
24h pause/resume context-rot regression.
Multi-day initiative (48–72h) with multiple resumes.

Ensure task context:

export MCP_RUNNER_TASK_ID=<task-id>

Run the scenario using codex-orchestrator start <pipeline> --format json and record the manifest path.

collab-evals

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper