Name: genie-benchmark-evaluator
Availability: InStock
Author: databricks-solutions

System Documentation

What problem does it solve?

This Skill rigorously evaluates the accuracy and quality of SQL generated by a Genie Space against a set of predefined benchmarks, identifying areas for improvement.

Core Features & Use Cases

Multi-Layered Evaluation: Employs 8 scorers across 3 layers (quality, correctness, arbiter) to provide comprehensive feedback.
MLflow Integration: Logs all evaluation metrics, results, and traces to MLflow for detailed analysis and version tracking.
Automated Correction: The arbiter layer can automatically update benchmarks or suggest metadata optimizations based on evaluation outcomes.
Use Case: After optimizing a Genie Space's SQL generation capabilities, use this Skill to quantitatively measure the improvement in accuracy and identify any regressions before deploying the changes.

Quick Start

Use the genie-benchmark-evaluator skill to evaluate the genie space with ID 'my-space-id' against the benchmarks defined in 'golden-queries.yaml'.

Please help me install this Skill: Name: genie-benchmark-evaluator Download link: https://github.com/databricks-solutions/vibe-coding-workshop-template/archive/main.zip#genie-benchmark-evaluator Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

genie-benchmark-evaluator

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper