llm-debug-test-failures

Official

Debug LLM test failures

AuthorArm-Examples
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps developers efficiently diagnose and resolve failing integration tests for Large Language Models (LLMs), pinpointing issues related to model output, configuration, or backend regressions.

Core Features & Use Cases

  • Reproduce Failing Tests: Easily re-run specific failing tests with verbose output.
  • Inspect Model Responses: Capture detailed logs of prompts, responses, and runtime parameters for analysis.
  • Validate Configurations: Verify model configuration files, paths, and runtime settings like context size and batch size.
  • Trace Issues: Step through backend integrations (llama.cpp, ONNX Runtime GenAI, MediaPipe, MNN) and upstream framework sources to identify bugs.
  • Use Case: When an llm-cpp-ctest fails due to unexpected model output, use this Skill to rerun the test, capture the exact prompt and response, and inspect the configuration to understand why the output drifted.

Quick Start

Rerun the failing LLM integration tests verbosely from your build directory.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: llm-debug-test-failures
Download link: https://github.com/Arm-Examples/LLM-Runner/archive/main.zip#llm-debug-test-failures

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.