ref-hallucination-arena
OfficialBenchmark LLM reference accuracy
Education & Research#llm evaluation#benchmark#hallucination detection#academic references#llm benchmarking#citation accuracy
Authoragentscope-ai
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the critical issue of Large Language Models (LLMs) fabricating academic references, ensuring the reliability of AI-generated literature reviews and citations.
Core Features & Use Cases
- Comprehensive Verification: Validates cited papers against major academic databases (Crossref, PubMed, arXiv, DBLP).
- Detailed Metrics: Measures hallucination rate, per-field accuracy (title, author, year, DOI), and discipline-specific performance.
- Use Case: When evaluating an LLM's ability to generate a literature review, use this Skill to automatically check if all the provided citations are real and accurate, quantifying its tendency to hallucinate.
Quick Start
Evaluate LLM reference recommendation capabilities using the provided configuration file.
Dependency Matrix
Required Modules
matplotlib
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: ref-hallucination-arena Download link: https://github.com/agentscope-ai/OpenJudge/archive/main.zip#ref-hallucination-arena Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.