genie-benchmark-evaluator

Official

Evaluate Genie Space SQL generation accuracy.

Authordatabricks-solutions
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill rigorously evaluates the accuracy and quality of SQL generated by a Genie Space against a set of predefined benchmarks, identifying areas for improvement.

Core Features & Use Cases

  • Multi-Layered Evaluation: Employs 8 scorers across 3 layers (quality, correctness, arbiter) to provide comprehensive feedback.
  • MLflow Integration: Logs all evaluation metrics, results, and traces to MLflow for detailed analysis and version tracking.
  • Automated Correction: The arbiter layer can automatically update benchmarks or suggest metadata optimizations based on evaluation outcomes.
  • Use Case: After optimizing a Genie Space's SQL generation capabilities, use this Skill to quantitatively measure the improvement in accuracy and identify any regressions before deploying the changes.

Quick Start

Use the genie-benchmark-evaluator skill to evaluate the genie space with ID 'my-space-id' against the benchmarks defined in 'golden-queries.yaml'.

Dependency Matrix

Required Modules

mlflow-genai-evaluationprompt-registry-patterns

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: genie-benchmark-evaluator
Download link: https://github.com/databricks-solutions/vibe-coding-workshop-template/archive/main.zip#genie-benchmark-evaluator

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.