genie-benchmark-evaluator
OfficialEvaluate Genie Space SQL generation accuracy.
Authordatabricks-solutions
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill rigorously evaluates the accuracy and quality of SQL generated by a Genie Space against a set of predefined benchmarks, identifying areas for improvement.
Core Features & Use Cases
- Multi-Layered Evaluation: Employs 8 scorers across 3 layers (quality, correctness, arbiter) to provide comprehensive feedback.
- MLflow Integration: Logs all evaluation metrics, results, and traces to MLflow for detailed analysis and version tracking.
- Automated Correction: The arbiter layer can automatically update benchmarks or suggest metadata optimizations based on evaluation outcomes.
- Use Case: After optimizing a Genie Space's SQL generation capabilities, use this Skill to quantitatively measure the improvement in accuracy and identify any regressions before deploying the changes.
Quick Start
Use the genie-benchmark-evaluator skill to evaluate the genie space with ID 'my-space-id' against the benchmarks defined in 'golden-queries.yaml'.
Dependency Matrix
Required Modules
mlflow-genai-evaluationprompt-registry-patterns
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: genie-benchmark-evaluator Download link: https://github.com/databricks-solutions/vibe-coding-workshop-template/archive/main.zip#genie-benchmark-evaluator Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.